The seminar is an international online event focused on exploring the theoretical foundations of interpretable and explainable AI. Its goal is to exchange ideas and form a supportive community for those interested in the topic.
Practicalities
- Monthly seminar, 15.00 Central European Time (CET) / 9.00 am Eastern Standard Time (EST)
- Zoom link: https://uva-live.zoom.us/j/87120549999
- Sign up:
Organizers:
Schedule
2024/2025
October 10 | Ulrike von Luxburg | ... |
Abstract... |
September 5 | Lesia Semenova | ... |
Abstract... |
2023/2024
July 11 | Sanjoy Dasgupta | ... |
Abstract... |
June 20 | Blair Bilodeau | Impossibility Theorems for Feature Attribution |
AbstractDespite a sea of interpretability methods that can produce plausible explanations, the field has also empirically seen many failure cases of such methods. In light of these results, it remains unclear for practitioners how to use these methods and choose between them in a principled way. In this paper, we show that for moderately rich model classes (easily satisfied by neural networks), any feature attribution method that is complete and linear—for example, Integrated Gradients and SHAP—can provably fail to improve on random guessing for inferring model behaviour. Our results apply to common end-tasks such as characterizing local model behaviour, identifying spurious features, and algorithmic recourse. One takeaway from our work is the importance of concretely defining end-tasks: once such an end-task is defined, a simple and direct approach of repeated model evaluations can outperform many other complex feature attribution methods. Paper: https://arxiv.org/abs/2212.11870 |
May 7 | Hidde Fokkema | Attribution-based Explanations that Provide Recourse Cannot be Robust |
AbstractSince most machine learning systems are not inherently
interpretable, a class of explainable machine learning methods
try to attribute importance of the input features to the outcome
of the model. We show that two often proposed requirements of
good attribution-based explanations are actually mathematically
incompatible. The first requirement is to provide recourse to
users: if the user is unhappy with the decision, the explanation
should tell them what they would need to change to improve the
decision. The second requirement is robustness: small changes in
a user's features (e.g. due to rounding or measurement errors)
should not cause large changes in the explanations. We show that
no method can always provide recourse and be robust, even though
both properties can be guaranteed individually. For some
restricted set of models, it is still possible for an
attribution method to be robust and provide recourse and I will
discuss some examples where this occurs. However, the message
will be that these classes are often simple enough that they do
not warrant an explanation. I will further illustrate our
findings with counterexamples to at least one of the
requirements for popular explanation methods like SHAP, LIME,
Integrated Gradients and SmoothGrad.
|
April 4 | Damien Garreau | A Sea of Words: An In-Depth Analysis of Anchors for Text Data |
Abstract
Anchors (Ribeiro et al., 2018) is a post-hoc, rule-based
interpretability method. For text data, it proposes to explain a
decision by highlighting a small set of words (an anchor) such
that the model to explain has similar outputs when they are
present in a document. In this talk, I will present a first
attempt to theoretically understand Anchors, considering that
the search for the best anchor is exhaustive. I will give
explicit results on shortcut models and linear models when the
vectorization step is TF-IDF, and word replacement is a fixed
out-of-dictionary token.
|