Over the last decade, machine learning has revolutionized entire areas of science ranging from drug discovery to autonomous driving, to medical diagnostics, to natural language processing and many others. Despite this impressive progress, it has become increasingly evident that modern machine learning models suffer from several issues which, if not resolved, could prevent their widespread adoption. Example challenges include lack of robustness guarantees to slight distribution shifts, reinforcing unfair bias present in training data, leakage of sensitive information through the model, and others.
Addressing these issues by inventing new methods and tools for establishing that machine learning models enjoy certain desirable guarantees, is critical, especially for domains where safety and security are paramount. Indeed, over the last few years there has been substantial research progress in new techniques aiming to address the above issues with most work so far focusing on perturbations applied to inputs of the model. For instance, the community has developed novel verification methods for proving that a model always classifies a sample (for example, an image) to the same label regardless of certain transformations (for example, an arbitrary rotation of up to five degrees). New sophisticated methods are constantly being invented targeting different properties, different types of guarantees (probabilistic, deterministic) and application domains (for example, natural language or visual perception).
Despite remarkable progress, there has been comparatively little work on proving that models work as intended when their training data has been manipulated (as opposed to manipulating test- or inference-time inputs). Addressing this challenge is important, as otherwise an adversary can poison the training data by inserting or modifying samples, to force the model into specific mis-predictions, leading to degraded performance at inference-time. This predicament can easily occur in various settings, for instance, in scenarios where models rely on large amounts of public (now potentially poisoned) data for their pretraining phase. At the same time, the problem is inherently difficult: proving that standard test-time performance is robust to changes of the training data requires reasoning about a potentially intractable number of models, each induced by a different subset of training data points. This means that a naive solution where one simply trains different models for different subsets of the data and then runs the test set through each of these models, is practically infeasible and raises the following fundamental question: how do we process an intractably large set of trained models when enumeration is infeasible?
The following PLDI paper addresses this challenging question in a clean, beautiful, and elegant manner. The key observation is to connect ideas from abstract interpretation with machine learning, by representing the intractably large set of trained models via a symbolic abstraction and introducing what amounts to a symbolic learning algorithm: rather than training a concrete model on a concrete dataset, the paper shows how to train a symbolic model (which captures an intractable number of concrete models) on a symbolic dataset (which captures an intractable number of possible concrete dataset variants). Creating the novel symbolic learning algorithm is non-trivial and requires introducing symbolic (abstract) counterparts for all key steps of the learning algorithm (for example, those computing outcome probabilities) and capturing their effect on the symbolic model. A key idea here is to train one symbolic model for each point in the test set, enabling a more scalable construction than a procedure which would be agnostic to the test set. Finally, given the symbolic model for that point, one tries to prove, via standard symbolic methods, that the point is indeed classified to the same label by that model.
Naturally, because the used abstractions are incomplete (to ensure scalability), this final proof step may fail even though the property holds. Importantly, however, the technique is sound: if the method states that a given point is classified to a particular label under the symbolic model, then the statement is guaranteed to hold. To demonstrate the overall concept, the paper focuses on a popular class of machine learning models, namely decision trees, but the introduced ideas are of general applicability.
The paper is a beautiful reference and a must-study for anyone interested in learning how to fruitfully connect, in a clean and elegant manner, the seemingly disparate worlds of data-driven optimization and symbolic techniques. Indeed, the creative fusion of these two fields will likely drive many new advances in next-generation AI systems. As to future work, now that we know progress is feasible, one would expect the paper to trigger much interest in proving properties involving training set perturbations, as well as in extending and applying these ideas to other kinds of models and application domains. This would necessarily require novel abstractions, resulting in new kinds of symbolic learning algorithms.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment