### Abstract

Machine learning models are brittle, and small changes in the training data can result in different predictions. We study the problem of proving that a prediction is robust to *data poisoning*, where an attacker can inject a number of malicious elements into the training set to influence the learned model. We target decision tree models, a popular and simple class of machine learning models that underlies many complex learning techniques. We present a sound verification technique based on *abstract interpretation* and implement it in a tool called Antidote. Antidote abstractly trains decision trees for an intractably large space of possible poisoned datasets. Due to the soundness of our abstraction, Antidote can produce proofs that, for a given input, the corresponding prediction would not have changed had the training set been tampered with or not. We demonstrate the effectiveness of Antidote on a number of popular datasets.

### 1. Introduction

Artificial intelligence, in the form of machine learning (ML), is rapidly transforming the world as we know it. Today, ML is responsible for an ever-growing spectrum of sensitive decisions—from loan decisions, to diagnosing diseases, to autonomous driving. Many recent works have shown how ML models are brittle,^{2,5,19,20,22} and with ML spreading across many industries, the issue of robustness in ML models has taken center stage. The research field that deals with studying robustness of ML models is referred to as *adversarial machine learning.* In this field, researchers have proposed many definitions that try to capture robustness to different *adversaries.* The majority of these works have focused on verifying or improving the model’s robustness to *test-time attacks*,^{8,9,21} where an adversary can craft small perturbations to input examples that fool the ML model into changing its prediction, for example, making the model think a picture of a cat is that of a zebra.^{4}

**Data-poisoning robustness.** This paper focuses on verifying *data-poisoning robustness*, which captures how robust a training algorithm *L* is to variations in a given *training set T.* Intuitively, applying *L* to the training set *T* results in a classifier (model) *M*, and in this paper, we are interested in how the trained model varies when producing perturbations of the input training set *T.*

The idea is that an adversary can produce slight modifications of the training set, for example, by supplying a small amount of malicious training points, to influence the produced model and its predictions. This attack model is possible when the data is curated, for example, *via* crowdsourcing or from online repositories, where attackers can try to add malicious elements to the training data. For instance, one might add malicious training points to affect a malware detection model,^{23} or a small number of images to bypass a facial recognition model.^{5}

**Verification challenges.** Data-poisoning robustness has been studied extensively.^{2,13,15,24,25} While some defenses have been proposed against specific attacks^{10,19}—that is, modifications to training sets—we are not aware of any technique that can formally verify that a given learning algorithm is robust to perturbations to a given training set. Verifying data-poisoning robustness of a given learner requires solving a number of challenges. First, the datasets over which the learner operates can have thousands of elements, making the number of modified training sets we need to consider intractably large to represent and explore explicitly. Second, because learners are complicated programs that employ complex metrics (e.g., entropy and loss functions), their verification requires new techniques.

**Our approach.** We present a new *abstraction-interpretation-based approach* on the problem of verifying data-poisoning robustness for *decision tree learners.* We choose decision trees because they are simple, widely used, interpretable models. We summarize our contributions as follows:

- Antidote: the first sound technique for verifying data-poisoning robustness for decision tree learners (§2).
- A
*trace-based view*of decision tree learning as a stand-alone algorithm that allows us to sidestep the challenging problem of reasoning about the set of all possible output trees a learner can output on different datasets (§3). - An
*abstract domain that concisely encodes sets of perturbed datasets*and the abstract transformers necessary to verify robustness of a decision tree learner (§4). - An evaluation of Antidote on five representative datasets from the literature. Antidote can prove poisoning robustness for all datasets in cases where an enumeration approach would be doomed to fail (§5).

### 2. Overview

In this section, we give an overview of decision tree learning, the poisoning robustness problem, and motivate our abstraction-based proof technique (Figure 1).

**Figure 1. High-level overview of our approach.**

**Decision tree learning.** Consider the dataset *T _{bw}* at the top of Figure 2. It is comprised of 13 elements with a single numerical feature. Each element is labeled as a white (empty) or black (solid) circle. We use

*x*to denote the feature value of each element. Our goal is to construct a decision tree that classifies a given number into white or black.

**Figure 2. Illustrative example.**

For simplicity, we assume that we can only build trees of depth 1, like the one shown at the bottom Figure 2. At each step of building a decision tree, the learning algorithm is looking for a predicate ϕ with the best score, with the goal of splitting the dataset into two pieces with *least diversity*; that is, most elements have the same class (formally defined usually using a notion of entropy). This is what we see in our example: Using the predicate *x* ≤ 10, we split the data-set into two sets, one that is mostly white (left) and one that is completely black (right). This is the best split we can have for our data, assuming we can only pick predicates of the form *x* ≤ *c*, for an integer *c.* Note that, while the set of predicates *x* ≤ *c* is infinite, for this dataset (and in general for any dataset), there exists only finitely many inequivalent predicates—for example, *x* ≤ 4 and *x* ≤ 5 split the dataset into the same two sets.

Given a new element for a classification, we check if it is ≤ 10, in which case we say it is white with probability 7/9—that is, the fraction of white elements such that ≤ 10. Otherwise, if the element is > 10, we say it is black with probability 1.

**Data-poisoning robustness.** Imagine we want to classify an input *x* but want to make sure the classification would not have changed had the training data been slightly different. For example, maybe some percentage of the data was maliciously added by an attacker to sway the learning algorithm, a problem known as *data poisoning.* Our goal is to check whether the classification of *x* is robust to data poisoning.

**A naïve approach.** Consider our running example and imagine we want to classify the number 5. Additionally, we want to prove that *removing up to two elements* from the training set would not change the classification of 5—that is, we assume that up to ~15% (or 2/13) of the dataset is contributed maliciously. The naïve way to do this is to consider every possible training dataset with up to two elements removed and retrain the decision tree. If all trees classify the input 5 as white, the classification is robust to this level of poisoning.

Unfortunately, this approach is intractable. Even for our tiny example, we have to train 92 trees
. For a dataset of 1,000 elements and a poisoning of up to 10 elements, we have ~10^{23} possibilities.

**An abstract approach.** Our approach to efficiently proving poisoning robustness exploits a number of insights. First, we can perform decision tree learning *abstractly* on a *symbolic set of training sets*, without having to deal with a combinatorial explosion. The idea is that the operations in decision tree learning, for example, selecting a predicate and splitting the dataset, do not need to look at every concrete element of a dataset, but at aggregate statistics (counts).

Recall our running example in Figure 2. Let us say that up to two elements have been removed. No matter what two elements you choose, the predicate *x* ≤ 10 remains one that gives *a* best split for the dataset. In cases of ties between predicates, our algorithm abstractly represents all possible splits. For each predicate, we can symbolically compute best- and worst-case scores in the presence of poisoning as an *interval.* Similarly, we can also compute an interval that overapproximates the set of possible classification probabilities. For instance, in the left branch of the decision tree, the probability will be [0.71, 1] instead of 0.78 (or 7/9). The best-case probability of 1 is when we drop the black points 0 and 4; the worst-case probability of 0.71 (or 5/7) is when we drop any two white points.

The next insight that enables our approach is that we *do not need to explicitly build the tree.* Since our goal is to prove robustness of a single input point, which effectively takes a single trace through the tree, we mainly need to keep track of the abstract training sets as they propagate along those traces. This insight drastically simplifies our approach; otherwise, we would need to somehow abstractly represent sets of elements of a tree data structure, a nontrivial problem in program analysis.

**Abstraction and imprecision.** We note that our approach is sound but necessarily incomplete; that is, when our approach returns “robust” the answer is correct, but there are robust instances for which our approach will not be able to prove robustness. There are numerous sources of imprecision due to overapproximation, for example, we use the *interval domain* to capture real-valued entropy calculations of different training set splits, as well as the final probability of classification.

**An involved example.** To further illustrate our technique, we preview one of our experiments. We applied our approach to the MNIST-1-7 dataset, which has been used to study data poisoning for deep neural networks^{19} and support vector machines.^{2} In our experiments, we checked whether Antidote could prove data poisoning for the inputs used in the same dataset when training a decision tree. For example, when applying Antidote to the image of the digit in Figure 3, Antidote proves that it is poisoning robust (always classified as a seven) for up to 192 poisoned elements in 90 seconds. This is equivalent to training on ~10^{432} datasets!

**Figure 3. Example MNIST-1-7 digit that is proven poisoning-robust by Antidote.**

### 3. Preliminaries

In this section, we begin by formally defining the *data-poisoning-robustness problem.* Then, we present a *trace-based* view of decision tree learning, which will pave the way for a poisoning robustness proof technique.

**3.1. The poisoning robustness problem**

In a typical supervised learning setting, we are given a learning algorithm *L* and a training set *T* ⊆ *X* · *Y* comprised of elements of some set *X*, each with its classification label from a finite set of classes *Y.* Applying *L* to *T* results in a classifier (or model): *M* : *X* → *Y.* For now, we assume that both the learning algorithm *L* and the models it learns are deterministic functions.

A *perturbed set* Δ(*T*) ∪ 2^{X x Y} defines a set of possible *neighboring* datasets of *T.* Our robustness definitions are relative to some given perturbation Δ. (In Section 4.1, we define a specific perturbed set that captures a particular form of data poisoning.)

DEFINITION 3.1 (POISONING ROBUSTNESS). *Fix a learning algorithm L, a training set T, and let* Δ(*T*) *be a perturbed set. Given an element x* ∈ *X, we say that x is robust to poisoning T if and only if*

*When T and* Δ *are clear from context, we will simply say that x is robust.*

In other words, no matter what dataset *T′* ∈ Δ(*T*) we use to construct the model *M* = *L*(*T′*), we want *M* to always return the same classification for *x.* returned for *x* by the model *L*(*T*) learned on the original training set *T.*

EXAMPLE 3.1. *Imagine we suspect that an attacker has contributed 10 training points to T, but we do not know which ones. We can define* Δ(*T*) *to be T as well as every subset of T of size |T|* – 10. *If an input x is robust for this definition of* Δ(*T*), *then no matter whether the attacker has contributed 10 training items or not, the classification of x does not change.*

**3.2. Decision trees: a trace-based view**

We will formalize a tree as the *set of traces* from the root to each of the leaves. As we will see, this trace-based view will help enable our proof technique.

A decision tree *R* is a finite set of traces, where each trace is a tuple (σ, *y*) such that σ is a sequence of Boolean predicates and *y* ∈ *Y* is the classification.

Semantically, a tree *R* is a function in *X* → *Y.* Given an input *x* ∈ *X*, applying *R*(*x*) results in a classification *y* from the trace (σ, *y*) ∈ *R* where *x* satisfies all the predicates in the sequence σ = [ϕ_{1}, …, ϕ_{n}], that is,
is true. We say a tree *R* is *well-formed* if for every *x* ∈ *X*, there exists exactly one trace (σ, *y*) ∈ *R* such that *x* satisfies all predicates in σ. In the following, we assume all trees are well-formed.

EXAMPLE 3.2 (DECISION TREE TRACES). *Consider the decision tree in Figure 2. It contains two traces, each with a sequence of predicates containing a single predicate:* ([*x* ≤ 10], *white*) *and* ([*x* > 10], *black*).

**3.3. Tree learning: a trace-based view**

We now present a simple decision tree learning algorithm, Trace. Then, in Section 4, we abstractly interpret Trace with the goal of proving poisoning robustness.

One of our key insights is that we do not need to explicitly represent the learned trees (i.e., the set of all traces), since our goal is to prove robustness of a *single input* point, which effectively takes a *single trace* through the tree. Therefore, in this section, we will define a *trace-based decision tree learning algorithm.* This is inspired by standard algorithms—such as CART,^{3} ID3,^{16} and C4.5^{17}—but *it is input-directed, in the sense that it only builds the trace of the tree that a given input x will actually traverse.*

**A trace-based learner.** Our trace-based learner Trace is shown in Figure 4. It takes a training set *T* and an input *x* and computes the trace traversed by *x* in the tree learned on *T.* Intuitively, if we compute the set of all traces Trace(*T*, *x*) for each *x* ∈ *T*, we get the full tree, the one that we would have traditionally learned for *T.*

**Figure 4. Trace-based decision tree learner Trace.**

The learner Trace repeats two core operations: (*i*) selecting a predicate ϕ with which to split the dataset (using bestSplit) and (*ii*) removing elements of the training set based on whether they satisfy the predicate ϕ (depending on *x*, using filter). The number of times the loop is repeated (*d*) is the maximum depth of the trace that is constructed. Throughout, we assume a fixed set of classes *Y* = {1, …, *k*}.

The mutable state of Trace is the triple (*T*, ϕ, σ): (*i*) *T* is the training set, which will keep getting refined (by dropping elements) as the trace is constructed. (*ii*) ϕ is the most recent predicate along the trace, which is initially undefined (denoted by ◇). (*iii*) σ is the sequence of predicates along the trace, which is initially empty.

**Predicate selection.** We assume that Trace is equipped with a finite set of predicates Φ with which it can construct a decision tree classifier; each predicate in Φ is a Boolean function in *X* → B.

bestSplit(*T*) computes a predicate ϕ^{*} ∈ Φ that splits the current dataset *T*—usually minimizing a notion of entropy. Ideally, the learning algorithm would consider every possible sequence of predicates to partition a dataset in order to arrive at an optimal classifier. For efficiency, a decision tree-learning algorithm does this greedily: It selects the best predicate it can find for a single split and moves on to the next split. To perform this greedy choice, it measures how diverse the two datasets resulting from the split are. We formalize this below:

We use *T*↓_{ϕ} to denote the subset of *T* that satisfies ϕ, that is,

Let Φ′ be the set of all predicates that do not trivially split the dataset: Φ′ = {*ϕ* ∈ Φ | *T*↓* _{ϕ}* ≠ ∅ ∧

*T*↓

*≠*

_{ϕ}*T*}. Finally, bestSplit(

*T*) is defined as follows:

where score(*T*, *ϕ*) = |*T*↓* _{ϕ}*|·ent(

*T*↓

*)+|*

_{ϕ}*T*↓

*|·ent(*

_{¬ϕ}*T*↓

*). Informally, bestSplit(*

_{¬ϕ}*T*) is the predicate that splits

*T*into two sets with the lowest entropy, as defined by the function ent in Figure 5. Formally, ent computes

*Gini impurity*, which is used, for instance, in the CART algorithm.

^{3}Note that if Φ′ = ∅, we assume bestSplit(

*T*) is undefined (returns ◇). If multiple predicates are possible values of bestSplit(

*T*), we assume one is returned nondeterministically. Later, in Section 4, our abstract interpretation of Trace will actually capture all possible predicates in the case of a tie.

**Figure 5. Auxiliary operator definitions. ent is Gini impurity; cprob returns a vector of classification probabilities, one element for each class i ∈ [1, k].**

EXAMPLE 3.3. *Recall our example from Section 2 and Figure 2. For readability, we use T instead of T _{bw} for the dataset. Let us compute* score(

*T, ϕ*),

*where ϕ is x*≤ 10.

*We have*|

*T*↓

*| = 9*

_{ϕ}*and*|

*T*↓

*| = 4.*

_{¬ϕ}*For the classification probabilities, defined by*cprob

*(Figure 5), we have*cprob(

*T*↓

*) = 〈7/9, 2/9〉*

_{ϕ}*and*cprob(

*T*↓

*) = 〈0, 1〉*

_{¬ϕ}*assuming the first element represents white classification; for example, in T*↓

*,*

_{ϕ}*there’s a*7/9

*chance of being classified as white. For*ent,

*we have*ent(

*T*↓

*) ≈ 0.35*

_{ϕ}*and*ent(

*T*↓

*) = 0.*

_{¬ϕ}*Since T*↓

_{ϕ}*only comprises black points, its Gini impurity is 0.*

The score of *x* ≤ 10 *is therefore* ~3.1. *For the predicate x* ≤ 11, *we get the higher (worse) score of* ~3.2, *as it generates a more diverse split.*

**Filtering the dataset.** The operator filter removes elements of *T* that evaluate differently than *x* on ϕ. Formally,

**Learner result.** When Trace terminates in a state (*T _{r}*, ϕ

_{r}, σ

_{r}), we can read the classification of

*x*as the class

*i*with the highest number of training elements in

*T*.

_{r}Using cprob, in Figure 5, we compute the probability of each class *i* for a training set *T* as a vector of probabilities. Trace returns the class with the highest probability:

As before, in case of a tie in probabilities, we assume a nondeterministic choice.

EXAMPLE 3.4. *Following the computation from Ex. 3.3, Trace*(*T*, 18) *terminates in state* (*T*↓_{x>10}, *x* ≤ 10, [*x* > 10]). *Point 18 is associated with the trace* [*x* > 10] *and is classified as black because* cprob(*T*↓_{x>10}) = 〈0, 1〉.

### 4. Abstract Poisoned Semantics

In this section, we begin by defining a data-poisoning model in which an attacker contributes a number of malicious training items. Then, we demonstrate how to apply the trace-based learner Trace to *abstract sets of training sets*, allowing us to efficiently prove poisoning robustness.

We consider a poisoning model where the attacker has contributed up to *n* elements of the training set—we call it *n-poisoning.* Formally, given a training set *T* and a natural number *n* ≤ |*T*|, we define the following perturbed set:

In other words, Δ_{n}(*T*) captures every training set the attacker could have possibly started from to arrive at *T.*

This definition of dataset poisoning matches many settings studied in the literature.^{5,19,23} The idea is that an attacker has contributed a number of malicious data points into the training set to influence the resulting classifier.

We do not know which *n* points in *T* are the malicious ones, or if there are malicious points at all. Thus, the set Δ_{n}(*T*) captures every possible subset of *T* where we have removed up to *n* (potentially malicious) elements. Our goal is to prove that our classification is robust to up to *n* possible poisoned points added by the attacker. So if we try every possible dataset in Δ_{n}(*T*) and they all result in the same classification on *x*, then *x* is robust regardless of the attacker’s potential contribution.

Observe that
. So even for relatively small datasets and number *n*, the set of possibilities is massive, for example, for MNIST-1-7 dataset (§5), for *n* = 50, we have about 10^{141} possible training sets in Δ_{n}(*T*).

**4.2. Abstractions for verifying n-poisoning**

Our goal is to efficiently evaluate Trace on an input *x* for all possible training datasets in Δ* _{n}*(

*T*). If all of them yield the same classification

*y*, then we know that

*x*is a robust input. Our insight is that we can abstractly interpret Trace on a symbolic set of training sets without having to fully expand it into all of its possible concrete instantiations.

Recall that the state of Trace is (*T, ϕ, σ*); for our purposes, we do not have to consider the sequence of predicates σ, as we are only interested in the final classification, which is a function of *T.* In this section, we present the *abstract domains* for each component of the learner’s state.

**Abstract training sets.** Abstracting training sets is the main novelty of our technique. We use the *abstract element* 〈*T′*, *n′*〉 to denote a set of training sets and it captures the definition of Δ_{n′} (*T′*): For every training set *T′* and number *n′*, the concretization function—that is, the function that maps an abstract element to the set of concrete elements it represents—is γ (〈*T′*, *n′*〉) = Δ_{n′} (*T′*). Therefore, we have that initially the *abstraction* function α—that is., the function that maps a set of concrete elements to an abstract one—which is defined as *α*(Δ_{n} (*T*)) = 〈*T, n*〉 is precise. Note that an abstract element 〈*T′*, *n′*〉 succinctly captures a large number of concrete sets, Δ_{n′} (*T′*). Further, all operations we perform on 〈*T′*, *n′*〉 will only modify *T′* and *n′*, without resorting to concretization.

Elements in the domain are ordered so that 〈*T*_{1}, *n*_{1}〉 ⊑ 〈*T*_{2}, *n*_{2}〉 if and only if *T*_{1} ⊆ *T*_{2} ∧ *n*_{1} ≤ *n*_{2} − |*T*_{2} \ *T*_{1}|.

We can define an *efficient* join operation—that is, the operation that given two abstract elements, returns an abstract element that subsumes both—on two elements in the abstract domain as follows:

DEFINITION 4.1 (JOINS). *Given two training sets T*_{1}, *T*_{2} *and n*_{1}, *n*_{2} ∈ ℕ, 〈*T*_{1}, *n*_{1}〉 ⊔ 〈*T*_{2}, *n*_{2}〉 ≔ 〈*T*′, *n*′〉 *where* *T*′ = *T*_{1} ∪ *T*_{2} *and n′* = max(|*T*_{1} \ *T*_{2}| + *n*_{2}, | *T*_{2} \ *T*_{1} | + *n*_{1}).

Notice that the join of two sets is an overapproximation of the union of the two sets.

PROPOSITION 4.1. *For any* *T*_{1}, *T*_{2}, *n*_{1}, *n*_{2}*:*

EXAMPLE 4.1. *For any training set T*_{1}, *if we consider the abstract sets* 〈*T*_{1}, 2〉 *and* 〈*T*_{1}, 3〉, *because the second set represents strictly more concrete training sets, we have*

*Now consider the training set T*_{2} = {*x*_{1}, *x*_{2}}. *We have*

*Notice how the join increased the poisoned elements from 2 to 3 to accommodate for the additional element x*_{3}.

**Abstract predicates and numeric values.** When abstractly interpreting what predicates the learner might choose for different training sets, we will need to abstractly represent sets of possible predicates. Simply, a set of predicates is abstracted *precisely* as the corresponding set of predicates Ψ—that is, for every set Ψ, we have α(Ψ) = Ψ and γ(Ψ) = Ψ. Moreover, Ψ_{1} ⊔ Ψ_{2} = Ψ_{1} ∪ Ψ_{2}. For certain operations, it will be handy for Ψ to contain a special null predicate ◇.

When abstractly interpreting numerical operations, such as cprob and ent, we will need to abstract sets of numerical values. We do so using the standard *intervals* abstract domain (denoted [*l*, *u*]). For instance, *α*({0.2, 0.4, 0.6}) = [0.2, 0.6] and γ([0.2, 0.6]) = {*x* | 0.2 ≤ *x* ≤ 0.6}. The join of two intervals is defined as [*l*_{1}, *u*_{1}] ⊔ [*l*_{2}, *u*_{2}] = [min(*l*_{1}, *l*_{2}), max(*r*_{1}, *r*_{2})]. Interval arithmetic follows the standard definitions.

**4.3. Abstract learner Trace ^{#}**

We are now ready to define an abstract interpretation of the semantics of our decision tree learner, denoted Trace^{#}.

**Abstract domain.** Recall that the state of Trace is (*T, ϕ, σ*); for our purposes, we do not have to consider the sequence of predicates *σ*, as we are only interested in the final classification, which is a function of *T.* Using the domains described in Section 4.2, at each point in the learner, our abstract state is a pair (〈*T′*, *n′*〉, Ψ′) that tracks the current set of training sets and the current set of possible most recent predicates the algorithm has split on (for all considered training sets).

When verifying *n*-poisoning for a training set *T*, the initial abstract state of the learner will be the pair (〈*T*, *n*〉, {◇}). In the rest of the section, we define the abstract semantics (i.e., our abstract transformers) for all the operations performed by Trace^{#}. For operations that only affect one element of the state, we assume the other component is unchanged.

**4.4. Abstract semantics of auxiliary operators**

We will begin by defining the abstract semantics of the auxiliary operations in the algorithm before proceeding to the core operations, filter and bestSplit.

We begin with
, that is, the abstract analog of *T*↓* _{ϕ}*.

It removes elements not satisfying *ϕ* from *T*; since the size of *T*↓* _{ϕ}* can go below

*n*, we take the minimum of the two.

PROPOSITION 4.2. *Let* *T*′ ∈ γ(〈*T, n*〉). *For any predicate ϕ*, *we have*
.

Now, consider cprob(*T*), which returns a vector of probabilities for different classes. Its abstract version returns an interval for each probability, denoting the lower and upper bounds based on the training sets in the abstract set:

where *c*_{i} = |{(*x*, *i*) ∈ *T*}|. In other words, for each class *i*, we need to consider the best- and worst-case probability based on removing *n* elements from the training set, as denoted by the denominator and the numerator. Note that in the corner case where *n* = |*T*|, we set cprob^{#}(〈*T, n*〉) = 〈[0, 1]〉_{i∈[1, k]}.

PROPOSITION 4.3. *Let* *T*′ ∈ γ(〈*T, n*〉). *Then,*

*where* γ (cprob^{#}(〈*T*, *n*〉)) *is the set of all possible probability vectors in the vector of intervals.*

EXAMPLE 4.2. *Consider the training set on the left side of the tree in Figure 2; call it T _{ℓ}. It has 7 white elements and 2 black elements.* cprob(

*T*) = 〈7/9, 2/9〉,

_{ℓ}*where the first element is the white probability.*cprob

^{#}(〈

*T*, 2〉)

_{ℓ}*produces the vector*〈[5/9, 1], [0, 2/7]〉.

*Notice the loss of precision in the lower bound of the first element. If we remove two white elements, we should get a probability of*5/7,

*but the interval domain cannot capture the relation between the numerator and denominator in the definition of*cprob

^{#}.

The abstract version of the Gini impurity is identical to the concrete one, except that it performs interval arithmetic:

Each term *ι*_{i} denotes an interval.

**4.5. Abstract semantics of filter**

We are now ready to define the abstract version of filter. Since we are dealing with abstract training sets, as well as a set of predicates Ψ, we need to consider for each *ϕ* ∈ Ψ all cases where *x* ⊨ *ϕ* or *x* ⊨ ¬*ϕ*, and take the join of all the resulting training sets (Definition 4.1). Let

Then,

PROPOSITION 4.4. *Let* *T*′ ∈ γ(〈*T, n*〉) *and* *ϕ*′ ∈ Ψ. *Then,*

EXAMPLE 4.3. *Consider the full dataset T*_{bw} *from Figure 2. For readability, we write T instead of T*_{bw} *in the example. Let x denote the input with numerical feature 4, and let* Ψ = {*x* ≤ 10}. *First, note that because* Ψ _{¬x} *is the empty set, the right-hand side of the result of applying the* filter^{#} *operator will be the bottom element* 〈Ø, 0〉 *(i.e., the identity element for* ⊔*). Then,*

**4.6. Abstract semantics of bestSplit**

We are now ready to define the abstract version of bestSplit. We begin by defining bestSplit^{#} without handling trivial predicates, then we refine our definition.

**Minimal intervals.** Recall that in the concrete case, bestSplit returns a predicate that minimizes the function score(*T, ϕ*). To lift bestSplit to the abstract semantics, we define score^{#}, which returns an interval, and what it means to be a *minimal* interval—that is, the interval corresponding to the abstract minimal value of the objective function score^{#}(*T, ϕ*).

Lifting score(*T, ϕ*) to score^{#}(〈*T*, *n〉, ϕ*) can be done using the sound transformers for the intermediary computations:

where |〈*T, n*〉| ≔ [|*T*| − *n*, |*T*|].

However, given a set of predicates Φ, bestSplit^{#} must

return the ones with the minimal scores. Before providing the formal definition, we illustrate the idea with an example.

EXAMPLE 4.4. *Imagine a set of predicates* Φ = {*ϕ*_{1}, *ϕ*_{2}, *ϕ*_{3}, *ϕ*_{4}} *with the following intervals for* score^{#}(〈*T*, *n*〉, *ϕ*_{i}).

*Notice that ϕ*_{1} *has the lowest upper bound for score (denoted in red and named lub*_{Φ}*). Therefore, we call* score^{#}(〈*T*, *n*〉, *ϕ*_{1}) *the* minimal interval *with respect to* Φ. bestSplit^{#} *returns all the predicates whose scores* overlap *with the minimal interval* score^{#}(〈*T*, *n*〉, *ϕ*_{1}), *which in this case are* *ϕ*_{1}, *ϕ*_{2}, *and* *ϕ*_{3}. *This is because there is a chance that* *ϕ*_{1}, *ϕ*_{2} *and* *ϕ*_{3} *are indeed the predicates with the best score, but our abstraction has lost too much precision for us to tell conclusively.*

Let lb/ub be functions that return the lower/upper bound of an interval. First, we define the lowest upper bound among the abstract scores of the predicates in Φ as

We can now define the set of predicates whose score overlaps with the minimal interval as (we avoid some corner cases here for clarity):

LEMMA 4.1. *Let T′* ∈ γ (〈*T*, *n*〉)). *Then*,

We abstractly interpret conditionals in Trace, as is standard, by taking the join of all abstract states from the feasible *then* and *else* paths. In Trace, there are two branching statements of interest for our purposes, one with the condition ent(*T*) = 0 and one with *ϕ* = ◇.

Let us consider the condition *ϕ* = ◇. Given an abstract state (〈*T*, *n*〉, Ψ), we simply set Ψ = {◇} and propagate the state to the *then* branch (unless, of course, ◇ ∉ Ψ, in which case we omit this branch). For *ϕ* ≠ ◇, we remove ◇ from Ψ and propagate the resulting state through the *else* branch.

Next, consider the conditional ent(*T*) = 0. For the *then* branch, we need to *restrict* an abstract state (〈*T*, *n*〉, Ψ) to training sets with 0 entropy: Intuitively, this occurs when all elements have the same classification. We ask: *Are there any concretizations composed of elements of the same class?*, and we proceed through the *then* branch with the following training set abstraction:

The idea is as follows: The set *T′* defines a subset of *T* containing only elements of class *i.* But if we have to remove more than *n* elements from *T* to arrive at *T′*, then the conditional is not realizable by a concrete training set of class *i*, and so we return the empty abstract state.

In the case of ent(*T*) ≠ 0 (*else* branch), we soundly (imprecisely) propagate the original state without restriction.

**4.8. Soundness of abstract learner**

Trace^{#} soundly overapproximates the results of Trace and can therefore be used to prove robustness to *n*-poisoning. Let *I* = ([*l*_{1}, *u*_{2}], …, [*l*_{k}, *u*_{k})] be a set of intervals. We say that interval [*l _{i}*,

*u*] dominates

_{i}*I*if and only if

*l*>

_{i}*u*for every

_{j}*j*≠

*i*.

THEOREM 4.2. *Let* 〈*T′*, *n′*〉 *be the final abstract state of Trace ^{#}*(〈

*T*,

*n*〉,

*x*).

*If I*= cprob

^{#}(〈

*T′*,

*n′*〉))

*and there exists an interval in I that dominates I (i.e., same class is selected for every T*∈ γ(〈

*T*,

*n*〉)

*), then x is robust to n-poisoning of T.*

The state abstraction used by Trace^{#} can be *imprecise*, mainly due to the join operations that take place, for example, during filter^{#}. The primary concern is that we are forced to perform a very imprecise join between possibly quite dissimilar training set fragments. Consider the following example:

EXAMPLE 4.5. *Let us return to T*_{bw} *from Figure 2, but imagine we have continued the computation after filtering using x* ≤ 10 *and have selected some best predicates. Specifically, consider a case in which we have x* = 4 *and*

- 〈
*T*, 1〉,*where T*= {0, 1, 2, 3, 4, 7, 8, 9, 10} - Ψ = {
*x*≤ 3,*x*≤ 4} (*ignoring whether this is correct*)

*Let us evaluate* filter^{#}(〈*T*, 1〉, Ψ, *x*). *Following the definition of* filter^{#}, *we will compute*

*where T*_{≤4} = {(4, *b*), (3, *w*), (2, *w*), (1, *w*), (0, *b*)} *and T*_{>3} = {(4, *b*), (7, *w*), (8, *w*), (9, *w*), (10, *w*)}, *thus giving us T′* = *T (the set we began with) and n′* = 5 *(much larger than what we began with). This is a large loss in precision.*

To address this imprecision, we will consider a *disjunctive* version of our abstract domain, consisting of unboundedly many disjuncts of this previous domain, which we represent as a set {(〈*T*, *n*〉_{i}, 〈_{i}, Ψ_{i})}_{i}. Our join operation becomes very simple: It is the union of the two sets of disjuncts.

DEFINITION 4.2 (JOINS). *Given two disjunctive abstractions* *D _{I}* = {(〈

*T*,

*n*〉

_{i}, Ψ

_{i})}

_{i∈I}

*and*

*D*

_{J}= {(〈

*T*,

*n*〉

_{j}, Ψ

_{j})}

_{j∈J},

*we define*

Adapting Trace^{#} to operate on this domain is immediate: Each of the transformers described in the previous section is applied to each disjunct. Because our disjunctive domain eschews memory efficiency and time efficiency for precision, we are able to prove more things, but at a cost.

### 5. Implementation and Evaluation

We implemented our algorithms Trace and Trace^{#} in C++ in a (single-threaded) prototype we call Antidote. Our evaluation aims to answer the following research questions:

**RQ1**Can Antidote prove data-poisoning robustness for real-world datasets?**RQ2**How does the performance of Antidote vary with respect to the scale of the problem and the choice of abstract domain?

**5.1. Benchmarks and experimental setup**

We experiment on 5 datasets (Table 1). We obtained the first three datasets from the UCI Machine Learning Repository.^{7} Iris is a small dataset that categorizes three related flower species; Mammographic Masses and Wisconsin Diagnostic Breast Cancer are the two datasets of differing complexities related to classifying whether tumors are cancerous. We also evaluate the widely studied MNIST dataset of handwritten digits,^{11} which consists of 70,000 grayscale images (60,000 training, 10,000 test) of the digits zero through nine. We consider a form of MNIST that has been used in the poisoning literature and create another variant for evaluation:

**Table 1. Detailed metrics for the benchmark datasets considered in our evaluation.**

- We make the same simplification as in other work on data poisoning
^{2,19}and restrict ourselves to the classification of ones versus sevens (13,007 training instances and 2163 test instances), which we denote MNIST-1-7-Real. This benchmark has been recently used to study poisoning in support vector machines.^{19} - Each MNIST-1-7-Real image’s pixels are 8-bit integers (which we treat as real-valued); to create a variant of the problem with reduced scale, we
*also*consider MNIST-1-7-Binary, a black-and-white version that uses each pixel’s most significant bit (i.e., our predicates are Boolean).

For each dataset, we consider a decision tree learner with a maximum tree depth (i.e. the number of calls to bestSplit) ranging from 1 to 4. Table 1 shows that test set accuracies of the decision trees learned by Trace are reasonably high—affirmation that when we prove the robustness of its results, we are proving something worthwhile.

**Experimental setup.** For each test element, we explore the amount of poisoning (i.e., how large of a *n* from our Δ_{n} model) for which we can prove the robustness property as follows.

- For each combination of dataset
*T*and tree depth*d*, we begin with a poisoning amount*n*= 1; that is, a single element could be missing from the training set. - For each test set element
*x*, we attempt to prove that*x*is robust to poisoning*T*using any set in Δ_{n}(*T*). Let*S*be the test subset for which we do prove robustness for poisoning amount_{n}*n.*If*S*is non-empty, we double_{n}*n*and again attempt to verify the property on elements in*S*._{n} - If at a depth
*n*all instances fail, we binary search between*n*and*n*/2 to find an*n*/2 <*n′*<*n*at which some instances terminate. This approach allows us to better illustrate the experiment trends in our plots.

Failure occurs due to any of three cases: (*i*) The computed over-approximation does not conclusively prove robustness, (*ii*) the computation runs out of memory, or (*iii*) the computation exceeds a one-hour time-out. We run the entire procedure for the non-disjunctive and disjunctive abstract domains.

**5.2. Effectiveness of antidote**

We evaluate how effective Antidote is at proving data-poisoning robustness. In this experiment, we consider a run of the algorithm on a single test element successful if either the non-disjunctive or disjunctive abstract domain succeeds at proving robustness (mimicking a setting in which two instances of Trace^{#}, one for each abstract domain, are run in parallel)—we contrast the results for the different domains in the original paper. Figure 6 shows these results.

**Figure 6. Fraction of test instances proven robust versus poisoning parameter n (log scale). The dotted line is a visual aid, indicating n is 1% of the training set size.**

To exemplify the power of Antidote, draw your attention to the depth-2 instance of Trace^{#} invoked on MNIST-1-7-Real. For 38/100 test instances, we are able to verify that even if the training set had been poisoned by an attacker who contributed up to 64 poisoned elements
, the attacker would not have had any power to change the resulting classification. Conventional machine learning wisdom says that, in decision tree learning, small changes to the training set can cause the model to behave quite differently. Our results verify nuance—sometimes, there is some stability. These 38 verified instances average ~800*s* run time. Δ_{64}(*T*) consists of ~10^{174} concrete training sets; this is staggeringly efficient compared to a naïve enumeration baseline, which would be unable to verify robustness at this scale.

To answer **RQ1**, *Antidote can verify robustness for real-world datasets with extremely large perturbed sets and decision tree learners with high accuracies.*

We delegate the results for **RQ2** to the original paper, but we note that decision tree depth the number of poisoned elements are the most important factors for the scalability of our approach: Larger trees and more poisoning mean a higher loss of precision, and therefore less proofs. Similarly, larger trees lead to scalability issues as the number of abstract datasets we maintain may grow exponentially.

### 6. Related Work

**Data poisoning.** Data-poisoning robustness has been studied extensively from an attacker perspective.^{2, 13, 15, 24, 25} This body of work has demonstrated attacks that can degrade classifier (in particular, SVMs) accuracy, sometimes dramatically. Our approach instead *proves* that no poisoning attack exists using abstract interpretation, while existing techniques largely provide search techniques for finding poisoned training sets. Recently, we have showed that our technique is robust and can handle different poisoning models,^{14} for example, ones in which the attacker can remove, add, or modify data. This recent work also shows how poisoning can disparately impact different parts of the test data (e.g., when dividing a test data where each element corresponds to a person based on ethnicity), therefore opening new problems in the field of algorithmic fairness.

Some recent techniques modify the training processes and hypothesis class to prove robustness to data poisoning,^{12,18} while our approach proves that a certain algorithm is robust to poisoning with respect to a given data set. Additionally, these approaches provide *probabilistic* guarantees about certain kinds of attacks on a modified training algorithm, while our work provides deterministic guarantees.

**Abstract interpretation for robustness.** Abstract interpretation^{6} is a framework for static program analysis. Our work is inspired by abstract interpretation-based verification of neural networks.^{1} However, we tackle the problem of verifying training time robustness, while existing works focus on test-time robustness. The former problem requires abstracting sets of training sets, while the latter only requires abstracting sets of individual inputs.

### Acknowledgments

This material is based upon work supported by the National Science Foundation under grant numbers 1652140, 1704117, 1750965, and 1918211. The work is also supported by a Facebook Award and a Microsoft Faculty fellowship.

## Join the Discussion (0)

## Become a Member or Sign In to Post a Comment