The seminar is an international online event focused on exploring the theoretical foundations of interpretable and explainable AI. Its goal is to exchange ideas and form a supportive community for those interested in the topic.

Practicalities

Organizers:

Schedule

2024/2025

October 10 Ulrike von Luxburg ...
Abstract

...

September 5 Lesia Semenova ...
Abstract

...

2023/2024

July 11 Sanjoy Dasgupta Recent progress on interpretable clustering
Abstract

The widely-used k-means procedure returns k clusters that have arbitrary convex shapes. In high dimension, such a clustering might not be easy to understand. A more interpretable alternative is to constrain the clusters to be the leaves of a decision tree with axis-parallel splits; then each cluster is a hyper-rectangle given by a small number of features. Is it always possible to find clusterings that are intepretable in this sense and yet have k-means cost that is close to the unconstrained optimum? A recent line of work has answered this in the affirmative and moreover shown that these interpretable clusterings are easy to construct. I will give a survey of these results: algorithms, methods of analysis, and open problems.

June 20 Blair Bilodeau Impossibility Theorems for Feature Attribution
Abstract

Despite a sea of interpretability methods that can produce plausible explanations, the field has also empirically seen many failure cases of such methods. In light of these results, it remains unclear for practitioners how to use these methods and choose between them in a principled way. In this paper, we show that for moderately rich model classes (easily satisfied by neural networks), any feature attribution method that is complete and linear—for example, Integrated Gradients and SHAP—can provably fail to improve on random guessing for inferring model behaviour. Our results apply to common end-tasks such as characterizing local model behaviour, identifying spurious features, and algorithmic recourse. One takeaway from our work is the importance of concretely defining end-tasks: once such an end-task is defined, a simple and direct approach of repeated model evaluations can outperform many other complex feature attribution methods. Paper: https://arxiv.org/abs/2212.11870

Video recording
May 7 Hidde Fokkema Attribution-based Explanations that Provide Recourse Cannot be Robust
Abstract

Since most machine learning systems are not inherently interpretable, a class of explainable machine learning methods try to attribute importance of the input features to the outcome of the model. We show that two often proposed requirements of good attribution-based explanations are actually mathematically incompatible. The first requirement is to provide recourse to users: if the user is unhappy with the decision, the explanation should tell them what they would need to change to improve the decision. The second requirement is robustness: small changes in a user's features (e.g. due to rounding or measurement errors) should not cause large changes in the explanations. We show that no method can always provide recourse and be robust, even though both properties can be guaranteed individually. For some restricted set of models, it is still possible for an attribution method to be robust and provide recourse and I will discuss some examples where this occurs. However, the message will be that these classes are often simple enough that they do not warrant an explanation. I will further illustrate our findings with counterexamples to at least one of the requirements for popular explanation methods like SHAP, LIME, Integrated Gradients and SmoothGrad.
This talk is based on joint work with Rianne de Heide and Tim van Erven.
Paper: https://jmlr.org/papers/v24/23-0042.html

Video recording
April 4 Damien Garreau A Sea of Words: An In-Depth Analysis of Anchors for Text Data
Abstract

Anchors (Ribeiro et al., 2018) is a post-hoc, rule-based interpretability method. For text data, it proposes to explain a decision by highlighting a small set of words (an anchor) such that the model to explain has similar outputs when they are present in a document. In this talk, I will present a first attempt to theoretically understand Anchors, considering that the search for the best anchor is exhaustive. I will give explicit results on shortcut models and linear models when the vectorization step is TF-IDF, and word replacement is a fixed out-of-dictionary token.
Paper: https://proceedings.mlr.press/v206/lopardo23a.html