Dramatic advances in the ability to gather, store, and process data have led to the rapid growth of data science and its mushrooming impact on nearly all aspects of the economy and society. Data science has also had a huge effect on academic disciplines with new research agendas, new degrees, and organizational entities.
Recognizing the complexity and impact of the field, Alfred Spector, Peter Norvig, Chris Wiggins, and Jeannette Wing have completed a new textbook on data science, Data Science in Context: Foundations, Challenges, Opportunities, published in October 2022.6 With deep and diverse experience in both research and practice, across academia, government, and industry, the authors present a holistic view of what is needed to apply data science well.
Ben Fried, a venture partner at Rally Ventures and formerly Google’s CIO for 14 years, and Michael Tingley, a software engineering manager at Meta, gathered the authors together as they were finishing up the manuscript to discuss the motivation for their work and some of its key points.
Norvig is a Distinguished Education Fellow at Stanford’s Institute for Human-centered Artificial Intelligence and a research director at Google, Spector is a visiting scholar at MIT with previous positions leading engineering and research organizations, Wiggins is an associate professor of applied mathematics at Columbia University, and Wing is executive vice president for research and professor of computer science at Columbia University.
BEN FRIED: You have come to data science from very different backgrounds. Was there a shared inspiration to write the book?
ALFRED SPECTOR: In one way or another, I think we all saw a deep and growing polarity in data science. On the one hand, it has enormous, unprecedented power for positive impact, which we’d each been lucky enough to contribute to. On the other hand, we had seen serious downsides emerge even with the best of intentions, often for reasons having little to do with the technical skills of the practitioner. There are many excellent texts and courses on the science and engineering of the field, but it seems like there is something in the headlines every day that demonstrates there is an urgent need to educate on what you, Ben, have called the “extrinsics” of the field.
PETER NORVIG: Throughout the rapid growth in applications of data science, there have been serious issues to confront: click-fraud, the early Google bombs, data leaks, abusive manipulation of applications, amplification of misinformation, overinterpretation of correlations, and so many more—all things we read about daily. Some problems are more serious than others, but we feel education will help us to lessen their frequency and severity, while simultaneously allowing us to understand their significance.
FRIED: Why the word Context in the title of your book?
CHRIS WIGGINS: It was our primary motivator. In a nutshell, we wanted to provide some inclusive “context” for the data-science discipline. We felt the term data science is often used too narrowly.
SPECTOR: We think of context in three ways.
It refers to the topics beyond just the data and the model. These include dependability, clarity of objectives, interpretability, and other things I’m sure we will get into.
It also refers to the domain in which data science is being applied. What is crucial for certain applications isn’t needed for others. Teams practicing data science must be particularly sensitive to the uses to which their work will be placed.
Finally, context refers to the societal views and norms that govern the acceptance of data-science results. Just as we have seen changing views and norms regarding privacy and fairness, data science will increasingly be expected to solve challenging problems, where societal views vary by region and over time. Some of these problems are “wicked,” in C. West Churchman’s2 language, and they are so very different from the problems that computing first addressed.
JEANNETTE WING: While data science draws from the disciplines of computer science, statistics, and operations research to provide methods, tools, and techniques we can apply, what we do will vary according to whether we’re working on a healthcare issue, something related to autonomous driving, or perhaps exploring some particular aspect of climate change. Just as each discipline comes with its own constraints, the same might be said of each of these different problem domains. Which is why the application of data science is largely defined by the nature of the problem we’re looking to solve or the task we’re trying to complete.
NORVIG: Beyond this, I personally wanted to reach a broader audience than I had with my more mathematical and algorithmic textbook. To do data science, we need to know many techniques, but we also need to be conversant with larger, societal issues. We all shared this motivation.
FRIED: All this leads to the question of how you define data science.
WING: By the time Alfred and I first started talking about working on a book, I was already writing papers and giving talks where I defined data science as the study of “extracting value from data.” But we agreed that this definition was too high level and insufficiently operational.
SPECTOR: So, we started with “extracting value from data,” then added prose to address the two personalities of the field—one where data is used to provide insight to people (as in many uses of statistics) and the other having to do with data science’s ability to enable programs to reach conclusions.
WIGGINS: We also recognized we needed a capacious definition (see sidebar “Definition of Data Science“) to respect what people are doing in the name of data science within industry and academia, as well as the rapidity of change in the field.
FRIED: It’s a very fluid definition. Not only does data science mean different things to different people, it also has fuzzy boundaries.
WIGGINS: Exactly! We’re at that time in the creation of a new field where it does have fuzzy boundaries. It touches on many different subjects: privacy/security, resilience, public policy, ethics, and so on. But it’s also clearly taking form with the creation of job titles, degrees, and departments. We saw an opportunity to take a stab at defining its breadth—starting with the diverse challenges its practitioners must overcome.
MICHAEL TINGLEY: Do you make a distinction between data science and machine learning?
SPECTOR: As a domain, data science is broader than machine learning, in that machine learning is only one of the techniques it employs. Data science encompasses many techniques from statistics, operations research, visualization, and many more areas: in fact, all the things needed to bring insights and conclusions to a worthwhile end. That being said, the revolutionary growth in machine learning has absolutely catalyzed the most change: incredible successes but some challenges too.
NORVIG: One difference is that, in the machine-learning arena, a researcher’s focus might be to write a paper that touts some new algorithm or some tweak to an existing algorithm. Whereas, in the data-science sphere, research is more likely to talk about a new dataset and how to apply a collection of techniques to use it.
FRIED: So, you were motivated by the breadth of challenges we face. Where did you end up? Are there approaches that can help?
NORVIG: After lots of give and take, we came up with something we call an analysis rubric (see sidebar) where we enumerate the elements a data scientist needs to take into account.
As Atul Gawande writes in The Checklist Manifesto,3 checklists such as our rubric make for better solutions, and we hope ours might help people avoid some of the mistakes we have made in past projects. But because each project is different, it’s hard to come up with one checklist that will work across all of them, so we’ll see how well it holds up to the test of time.
SPECTOR: Let’s be specific. The analysis rubric addresses the challenges in seven categories. Some relate more to how we implement or apply data science. The others relate more to the requirements we are trying to satisfy.
PETER NORVIG: For me, the first big revelation of data science was that data can be a key asset that offers real value. But, the second revelation was that data can be a liability if you’re not a good shepherd of it.
NORVIG: The rubric starts with data: getting and storing it, wrangling it into a useful form, ensuring privacy, ensuring integrity and consistency, managing sharing and deletion, and so on. In some ways, this may be the hardest part of a data-science project.
For me, the first big revelation of data science was that data can be a key asset that offers real value.4 But, the second revelation was that data can be a liability if you’re not a good shepherd for it.
FRIED: Are there hidden costs to holding onto data?
NORVIG: I’ve learned something in this regard from all the efforts that have been made in recent years to advance federated learning. In earlier days, if a team wanted to build a better speech-recognition system, it would import all the data into one location and then run and optimize a model there until they had something they could launch to users. But then that would have meant holding onto all these people’s private conversations, with concomitant risks. As a field, we decided it would be best if you didn’t hold onto that information but instead optimized each person’s data privately while figuring out some clever way to share the optimizations made individually with multiple people in a federated learning framework. This federated approach seems to be working out pretty well. The privacy concerns have ended up leading to a pretty good scientific advancement.
SPECTOR: Our second rubric element is the most obvious. There needs to be a technical approach, which can come from machine learning, statistics, operations research, or visualization. This offers a way to provide valuable insight and conclusions, whether prediction, recommendation, or the others.
It isn’t easy to find a model in some situations. Sometimes there is just too much inherent uncertainty, and other times the world may continually change and make modeling efforts ineffective. Some situations are game-theoretic, and a model’s conclusions themselves generate feedback that makes the world less predictable.
One example of the limitations of modeling has been to predict what might happen due to Covid-19. For many reasons relating to limitations of data, rapidly changing policy, variations in human behavior, and virus mutations, the ability to make long-term predictions of mortality has been poor.
ALFRED SPECTOR: All of the technology for capturing, storing, and locating data makes it far easier to cherry-pick data and use it out of context to advance erroneous points of view.
FRIED: Are you saying data science didn’t help at all in the war on Covid?
NORVIG: I was involved in a project with an intern and some statisticians at UC Berkeley where we were trying to give hospitals advance notice of how many staffers they would need to bring in three days ahead of time. We couldn’t give them accurate predictions 30 days in advance, but we could do useful short-term predictions.
WING: And for sure, data science was applied successfully in many other areas, most obviously in the vaccine and therapeutics trials.
FRIED: We could devote our whole time to models, but given the topic’s broad coverage, let’s move to the next rubric element: dependability.
WING: With data science being used in ever more important ways, dependability is of increasing importance, and we include four subtopics under it: Are the privacy implications of data collection, storage, and use acceptable? Are the security ramifications for the application acceptable, given the likelihood that attacks may release data or impair an application’s correctness or availability? Is a system resilient in the face of a world that is continually changing and with modeling techniques we may not fully understand? Finally, is the resulting system sufficiently resistant to the abuse that has savaged so many applications?
WIGGINS: We should note the tensions within the dependability components. The push for privacy versus the need to provide security is an example. End-to-end encryption would reduce risks to privacy and keep providers from seeing private messages, but it would also limit platforms’ abilities both to respond to law enforcement requests and to perform content moderation. There definitely are some unresolved tensions here.
TINGLEY: Getting privacy, security, resilience, and abuse resistance right is a good start and a formidable challenge in itself. Is that enough to allow people to trust the applications of data science?
SPECTOR: It’s probably not enough. Developers, scientists, and users must have sufficient understanding of data-science applications, particularly in increasingly sensitive situations. The general public and policymakers also need to have more understanding, given the pervasive impact.
This leads to the rubric topic of understandability, which has three categories: Must a model’s conclusions be interpretable—that is, should the application be able to explain “why?” Must conclusions prove causality, or is correlation sufficient? And must data-science applications, particularly in the realms of science and policy, make their data and models available to others so they can test for reproducibility?
Where data science is employed in research, the tradition is that others must be able to reproduce work so they can test and validate it. This is very hard to accomplish when we’re dealing with massive volumes of data and complex models.
NORVIG: Understandability has been particularly hard with machine learning, but contemporary research is making progress—for example, with visualization and what-if analysis tools. While causality is difficult to show with only retrospective data, the causal inference work from the statistics community can reduce the amount of additional experimentation needed to demonstrate it.
SPECTOR: Here’s a real-world example from about 10 years ago when I was at Google. Some argued it might be better for societies to measure and then maximize happiness rather than, say, per capita GDP (gross domestic product). Catalyzing this interest, perhaps, was Bhutan’s then-recently introduced gross domestic happiness metric. Some believed that Google could glean a happiness score from the collective searches of a population. Before we proceeded too far, we realized there was a big gotcha: The score would be so influential that Google would need to explain to the public how it was calculated. If the mechanism were fully explained, however, people would want to abuse it—and render it invalid. While there was data and (likely) a model, understandability—and then dependability—concerns eventually torpedoed the effort.
TINGLEY: This naturally leads to the question of setting precise goals. Are the objectives of the system an immutable, external property, or is there also some emergent property in how the system or its context evolves?
SPECTOR: The next rubric element relates to having clear objectives. Do we really know what we’re trying to achieve? Requirements analysis has always been needed in complex systems, but many uses of data science are extremely challenging. They require the balancing of near- and long-term objectives, the needs of different stakeholders, and so on. There may not even be societal consensus on what we should achieve. For example, how much fun—or how addictive—should a video game be? Which recommendations to a user are beneficial versus which might prove distracting in the wrong situations? Are some downright harmful?
As already mentioned, a society’s norms may change over time. It’s hard to anticipate everything, but we should try to think about the downside risks posed by aspects of a particular design. We advocate that these risks be made as explicit as possible.
WIGGINS: Beyond that, we need to be prepared to monitor the way a data product is used and to mitigate its harms. A video-game maker years ago may not have anticipated that some people now would consider their product to be addictive for young children. Mitigating harms, in this case, may mean design changes that prevent or lessen extended play or other signs of addictive behavior. Even then, not everyone in the company that made the game might agree this is a problem. A company committed to ethical data products, however, takes this seriously.
SPECTOR: An objectives-related topic unto itself is the incentive structure that data science makes feasible. Given the ability to measure and optimize almost anything, are we optimizing the right things? Which incentives should be built into systems to guide individuals, organizations, and governments in the best way?
FRIED: Where does fairness come into this? It’s critically important and very complex. Is there even agreement on what’s fair and what isn’t? Won’t those opinions change over time?
SPECTOR: Fairness is addressed in two ways in our rubric. First, it’s an implementation-oriented topic: Data collection and models need to be built and indeed tested to be sure they work well, not just on average but for subpopulations. Societal priorities proscribe conclusions that are reached based on subgroups’ protected attributes.
WING: On top of the typical software engineering challenge of making sure the model is working properly, we need to pay great attention to training data. This is pretty new for software engineers.
SPECTOR: I like to say that when systems learn from data, “the past may imprison the future,” thereby perpetuating unwanted behaviors.
Beyond these data and implementation challenges, the second fairness challenge is in goal-setting. There are complex ethical, political, and economic considerations about what constitutes fairness.
WIGGINS: Ultimately, this comes down to the objective of trying to gain value, which is a key word in our data-science definition, since it comes with both an objective meaning and a subjective meaning. That is, beyond whatever mathematical value we’re trying to calculate or optimize, there’s what we or our society may value. In part, I think this speaks to the fact we’re now making data-science applications that have more and more impact on society. Going back to context, you have to think about what constitutes a success, and that can be complicated.
As Alfred has observed, this involves deciding on the goal or objective function we’re trying to optimize while acknowledging what we are omitting. It’s very hard to consider all the possible edge cases and human impacts of some data-science applications.
WING: On a related topic, in our next rubric element we examine whether the data-science application is innately failure-tolerant, given that the objectives a system meets may not be perfectly defined, and they may be achieved only with some stochastic probability. Self-driving cars, for example, aren’t particularly failure-tolerant, whereas advertising would seem more so. But even some advertising applications of data science can be intolerant of failures; for example, it’s important to identify foreign sources of election advertising revenue and to abide by regulations governing certain products.
FRIED: What about the last rubric element?
WIGGINS: With data-science applications affecting individuals and societies, they must take into account ethics, as well as a growing body of regulations. These are covered in the ethical, legal, and societal implications element (as shown in the accompanying table).
Table. Illustration of the Analysis Rubric Elements
SPECTOR: Indeed, the body of laws governing many data-science uses is already quite large. Furthermore, there are broad societal implications; for example, data science almost certainly is altering the employment landscape and having effects on societal governance.
TINGLEY: As a practitioner, I think it’s wonderful to have some guiding principles like the rubric to think about. In practice, however, it’s sometimes difficult to anticipate these issues up front and perform risk assessments or even guess at some of the longer-term outcomes. For example, thinking about all the potential ethical implications of something before you even know where your investigation might lead is really challenging.
My question is: To what extent do we as practitioners bear responsibility for exhaustively analyzing and estimating these sorts of issues in advance? Isn’t it inevitable that much of this work is going to end up being guided by retrospective analysis once we’ve figured out where we’ve landed?
CHRIS WIGGINS: It’s not possible to know what all the possible failure modes are before a launch, but there are plenty of opportunities to maintain and monitor a product as the world changes and potential harms are made clear.
SPECTOR: Compounding the challenge you raise, the world might change just because of the launch, meaning the very existence of a data-science application changes the ground rules that guided its development. As an example, the world may become dependent on some application, which would result in increased dependability requirements.
WIGGINS: Then there’s also the matter of maintaining and monitoring a data product. It’s not possible to know what all the possible failure modes are before a launch, but there are plenty of opportunities to maintain and monitor a product as the world changes and potential harms are made clear.
WING: We hope practitioners will end up using the analysis rubric as a checklist during many stages of a project. Some things ought to be easy enough to consider before building a model, but then further assessment will also be required after the model is built. With data science, it’s even less likely that you’ll be able to anticipate everything in advance than it is with more traditional software.
SPECTOR: This emphasizes the role for product managers, who are tasked with looking at a project broadly. Their role becomes all the more critical as projects come to be less dominated by technology. In fact, if you talk to many product managers today, you will hear them say things like, “Our engineers started on this effort, particularly the machine learning, and they did a lot of work without pausing to think about all the other challenges they were likely to encounter. And I really wish they’d talked about that earlier because it would have saved us a lot of rework.” That being said, as Chris intimated, we don’t think everything should be approached with a waterfall methodology. There’s plenty of interaction and adaptation required.
FRIED: Let’s spend some more time on your work on ethics.
WING: While we could have kept the discussion of ethics implicit in the other rubric elements, such as our discussions of how to set good and fair objectives, Chris and I, in particular, wanted to focus on ethics explicitly. We decided to start with the Belmont principles5 as a basis and see how far they would take us. I’d say they have actually stood up pretty well so far.
FRIED: What are the Belmont principles, and how do you apply them?
WIGGINS: The Belmont principles were effectively an attempt to create a U.S. government specification for ethics. In response to serious ethical breaches in taxpayer-funded research, Congress in the 1970s created a diverse commission of philosophers, lawyers, policymakers, and researchers to figure out what qualifies as ethical research on human subjects. After years of discussion, the commission announced that its focus would turn to articulating a set of principles that would at least provide a common vocabulary for people who attempt to make a good-faith adjudication as to what qualifies as ethical behavior. The principles themselves are:
Respect for persons, ensuring the freedom of individuals to act autonomously based on their own considered deliberation and judgments.
Beneficence, that researchers should maximize benefits and balance them against risks.
Justice, the consideration of how risks and benefits are distributed, including the notion of a fair distribution.
These principles were ultimately released by the U.S. government in 1978, and they have since been used as a requirement in some federal funding decisions. One exploration in our book is how these principles remain useful for thinking through ethical decisions that researchers and organizations must make in data-science research and in developing data products.
FRIED: Are there any contemporary examples of how the Belmont principles are being applied?
SPECTOR: Perhaps the intense discussion of Covid-19 vaccination for young children is illustrative of the give and take. While it’s currently believed that vaccinating a young child may be of only modest benefit to the child, we have hoped that having fewer infectious children may reduce Covid-19 in elders with whom the child comes in contact.
This pretty explicitly shows the trade-offs: Respect for persons might argue we would not seek to vaccinate the child, since the vaccine is of unclear benefit and the child may be too young to provide informed consent. On the other hand, the principle of beneficence might win the day, given the potential for saving the lives of many grandparents. In a perfect world, this would be informed by good statistics.
In any case, it illustrates the sorts of challenges policymakers and parents face. We all believe that the explicit give and take of the Belmont principles in such situations ultimately provides better, and more transparent, decisions.
FRIED: Do you have an example more related to data science?
SPECTOR: Earlier in the discussion, Jeannette noted that self-driving cars are not naturally failure-tolerant. Interesting ethical questions—as well as some practical ones—come up around this since it’s unlikely a self-driving car will ever be 100% safe in all circumstances. We’ll face the question of what constitutes an acceptable failure rate as the technology gets closer to mass adoption. That is, how much risk are we willing to accept? Auto accidents currently account for around 40,000 deaths per year in the U.S. alone, but if perfection is required, we probably won’t ever be able to deploy the technology.
NORVIG: We are quite inconsistent as a society when it comes to what we will accept and what we won’t accept. While the debate over self-driving cars continues to rage on, I happen to know some people who are working on self-flying cars. I find it perplexing that as a society, we have apparently decided that having the 40,000 road deaths a year is OK, while the number of air-travel deaths ought to be zero. Accordingly, the legal requirements imposed by the FAA are far more stringent than those applied to road travel. And we need to wonder if that’s really a rational choice for how to run our society or whether we should instead be looking to make some different tradeoffs.
FRIED: The sphere of ethics is inherently qualitative, whereas computing is a highly quantitative practice. I’ve witnessed discussions that diminish qualitative standards because they can’t be measured and have no objective function. Given that, are you worried about uptake of these principles?
WIGGINS: In my experience, software engineers love to talk about design principles. In fact, Alfred mentioned the waterfall model, yet design methodologies are pretty qualitative. Engineers are already dealing with principles that get debated regularly—and changed with some frequency.
JEANNETTE WING: We saw an opportunity to put a stake in the ground by telling students, ‘if you want to be a data scientist, you are going to learn about ethics along with all this quantitative stuff.’
FRIED: Are the Belmont principles sufficient for any ethical question?
SPECTOR: While we focus on the Belmont principles, we also acknowledge that individual and organizational decision-making will take other frameworks into account. I call out three:
First, there are professional ethics, like the ACM code of ethics.1 Truthfulness, capability, and integrity must be a given as we apply data science.
Second, certain situations have different ethical standards. The war in Ukraine has made stark for us the laws of war, so-called jus in bello, and their implications.
Third, decisions are made in an economic framework, where the economic system exists to channel energy, competition, and self-interest into benefits for individuals and society.
WIGGINS: We want to remind everyone that it’s not enough to have principles. Each individual and organization applying data science needs to come up with organizational structures and approaches to incorporate them into their process.
WING: The academic community is taking this seriously. We saw an opportunity to put a stake in the ground by telling students, “If you want to be a data scientist, you are going to learn about ethics along with all this quantitative stuff.” The Academic Data Science Alliance, which began a few years ago, emphasizes ethics sufficiently that I believe ethics courses are now integral to most academic programs in the discipline. I’m very encouraged about this since, as data science is only beginning to emerge in academia, we are now incorporating these qualitative ethical principles, considered integral to the field.
NORVIG: This is just part of being in a field that’s finally growing up. When the work you are doing is only theoretical or academic, then you go ahead and publish your papers and it really doesn’t matter. But once that field starts to make a genuine impact on the world, you suddenly find you have some serious ethical responsibilities.
PETER NORVIG: If it hadn’t been for big data, we wouldn’t be talking today about data science as a separate field. Instead it would still be part of statistics.
FRIED: Looking at the other side of the coin, should an understanding of data science inform a liberal arts education that includes some exposure to ethics?
WIGGINS: Having taught a class on the history and ethics of data,7 I can tell you that humanities students show a tremendous interest in learning about it. And our engineering students even demand that we focus on the ethical aspects. You can imagine people who would like the topic to be taught as if it dwelled solely in the Platonic realm of pure thought. You can also imagine there are other people who would want us to focus more on the very applied and perhaps even product-driven aspects of the topic. I’ve found it useful to teach things historically to provide a structure to these different interests.
NORVIG: While it’s important to raise these issues and to have general principles, it’s also important to have case law based on real-world examples. That is, in our legal system we have laws that people take great care to write as clearly as they can, but they can’t anticipate all the possibilities that might surface later. We supplement the laws with case law.
It’s one thing to say that privacy and personhood are important rights. But then how does that apply to the use of surveillance cameras? You can’t really answer that just from general principles. You need to get more precise by specifying the types of uses that are approved and those that aren’t. Principles are a good starting point, but we also need the specificity that examples offer.
FRIED: Now I have an engineering question for you: Is scale inherent to data science?
NORVIG: Yes. If it hadn’t been for big data, we wouldn’t be talking today about data science as a separate field. Instead, it would still be part of statistics. While the folks in statistics were focused on whether you needed 30 or 40 samples to achieve statistical significance, there were some other people who were saying, “Well, we have a billion samples, so we are not going to worry about that. Instead, we have a few other problems and we are going to focus on them.” Those issues became the focus of the new field.
WING: However, we can do plenty of data science at a smaller scale with what some people call “artisanal data” or “precious data.” There are plenty of challenges to contend with in that space since it often involves working with combined datasets, which means dealing with all the issues that go along with heterogeneous data. So, we still have some fundamental scientific and mathematical questions to address, whether we are working with big data or heterogeneous small data.
SPECTOR: A side effect of all this data is that we all are regularly confronted with both meaningful—and not-so-meaningful—details that are hard to put into context. Considered within our understandability rubric element, the sheer volume of data and conclusions we get every day is difficult for even experts to understand. In particular, we are often presented with correlations whose meanings are not as far-reaching or conclusive as we are often led to believe. All of the technology for capturing, storing, and locating data makes it far easier to cherry-pick data and use it out of context to advance erroneous points of view.
NORVIG: Also, whenever data is derived from human interactions with various systems, there is a challenge to determine how much of it is trustworthy. For example, if you’re working with a lot of data that comes from observations of what people are clicking on, it might be tempting to assume they are clicking on things they are truly interested in. We humans have our frailties and biases—meaning our actions don’t always reflect our own best interests. We also have lapses in the sense people click on things without meaning to. It’s important to understand those limitations in order to interpret the data better.
FRIED: Given all of this, what concerns should we have about how data science allows us to derive answers and benefits based on user interactions, especially given how they can change over time without the creator of the model being aware?
NORVIG: This certainly presents a big challenge. We need to recognize we’re in a game-theory situation where, when you make a move, other people are going to make a move in response, whether they are spammers or legitimate participants in the ecosystem. This sort of runs counter to big data since, even if you have got millions of clicks, you won’t have any clicks for what happens after you reach and disseminate a conclusion.
You don’t know how people are going to change their strategies. You have no data on that whatsoever. There’s this tension between the things for which you can measure everything and know exactly what’s going on and the things in the future that may end up messing with your normal business model in unknown ways. Then there’s also the possibility you will have changed the ecosystem in ways you don’t understand.
SPECTOR: This applies to finance as well, of course. If you are applying algorithmic approaches to buying and selling and your activities are having an impact on the market, you can’t be certain exactly what effect your purchase or sale might have.
FRIED: Which is why analyses based on historical data have flaws. “Past performance may not be indicative of future results,” as all the brokerage houses are quick to remind you.
WIGGINS: If I can inject one broader aspect of scale, it also has an ethical valence. Big systems that operate at scale can have a far-ranging, global impact.
WING: From the engineering perspective, scientists have their own concerns. Often, they are working with massive amounts of data from sophisticated instruments from the IceCube Neutrino Observatory in Antarctica or the James Webb Space Telescope. And, from what my scientist colleagues tell me, they need new techniques for storing, preserving, and analyzing data.
TINGLEY: What about the software engineering of data science?
SPECTOR: It’s hard to build quality software under even the very best of circumstances. Data science adds a new level of challenge, because we are now using modules that are learning from data, and they may work well in some contexts and not in others. We may have confidence they are likely to work well for an average case, but we don’t know exactly how well they will work for certain inputs, and, again, we don’t know how well they will work over time.
WING: Having once been involved in the formal verification community, let me restate what Alfred said more formally. To show that a program was doing the right thing, we would use a very strong theorem—for all xP(x)—to over-prove the point. Then, once that had been demonstrated, we could be certain the computer would do exactly what we had intended for any valid input.
But for machine-learned models, universal quantification is too strong and unrealistic. We wouldn’t say for all xP(x) since we do not intend that a machine-learned model should work for all possible data distributions. Instead of proving for all xP(x), we could instead focus on proving for all data distributions within a certain class, but then we would need to characterize the class.
For robustness, we might say for all norm-bounded perturbations to characterize the class of data distributions for which a model is robust. But what about a property such as fairness? This soon becomes very tricky to formalize. A practical consequence is that we need to increase testing, recognizing—as in traditional software engineering—we’re never going to be able to test everything that’s likely to crop up in real life. This illustrates why trustworthiness is an important research frontier.
WIGGINS: Another point has to do with Ops—generalizing beyond just keeping a website up, to making sure a data-science application is continuing to work well. I mean, inputs can fail, abuse can occur, and models may be more brittle than thought. As I alluded to earlier, we need to continue monitoring the model as if it were a living thing. This also means thinking through how you’re going to monitor impacts on users, as well as your statistical metrics. There are some real engineering challenges to think about here in terms of how you are going to maintain observability for a data-science model that’s deployed, particularly since it will be retrained and refreshed regularly.
FRIED: We have covered a lot of ground today. Any final thoughts you would like to leave people with?
SPECTOR: We hope the analysis rubric shows a path toward providing useful structure to data science.
WING: All four of us definitely believe in harnessing data for good, whether for a university, a business, or society at large. But there’s no escaping the breadth of topics that need consideration. The breadth certainly complicates data-science education.
WIGGINS: I would emphasize that we are often solving very hard problems—these are sometimes wicked problems—and we need due consideration of many underlying principles. We then need to act on them and do the very best we can to balance sometimes-conflicting goals.
NORVIG: As I said earlier, our field is growing up. We are having a genuine impact on the world, and we find that we have to think hard along many dimensions to achieve the best possible goals.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment