On the Trade-Off between Actionable Explanations and the Right to be Forgotten

Martin Pawelczyk
University of Tübingen
first.last@uni-tuebingen.de
&Tobias Leemann
University of Tübingen
first.last@uni-tuebingen.de
&Asia Biega
Max-Planck Institute for Security and Privacy
first.last@acm.org
&Gjergji Kasneci¹
University of Tübingen
first.last@uni-tuebingen.de
Equal senior author contribution.

¹footnotemark: 1

Abstract

As machine learning (ML) models are increasingly being deployed in high-stakes applications, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle is the “right to be forgotten” which gives users the right to have their data deleted. Another key principle is the right to an actionable explanation, also known as algorithmic recourse, allowing users to reverse unfavorable decisions. To date it is unknown whether these two principles can be operationalized simultaneously. Therefore, we introduce and study the problem of recourse invalidation in the context of data deletion requests. More specifically, we theoretically and empirically analyze the behavior of popular state-of-the-art algorithms and demonstrate that the recourses generated by these algorithms are likely to be invalidated if a small number of data deletion requests (e.g., 1 or 2) warrant updates of the predictive model. For the setting of linear models and overparameterized neural networks – studied through the lens of neural tangent kernels (NTKs) – we suggest a framework to identify a minimal subset of critical training points, which when removed, would lead to maximize the fraction of invalidated recourses. Using our framework, we empirically establish that the removal of as little as 2 data instances from the training set can invalidate up to 95 percent of all recourses output by popular state-of-the-art algorithms. Thus, our work raises fundamental questions about the compatibility of “the right to an actionable explanation” in the context of the “right to be forgotten”.

1 Introduction

Machine learning (ML) models make a variety of consequential decisions in domains such as finance, healthcare, and policy. To protect users, laws such as the European Union’s General Data Protection Regulation (GDPR) [regulation2016regulation] or the California Consumer Privacy Act (CCPA) [ccpa2021] constrain the usage of personal data and ML model deployments. For example, individuals who have been adversely impacted by the predictions of these models have the right to recourse [voigt2017eu]. Several approaches in recent literature tackled the problem of providing recourses by generating local (instance level) counterfactual explanations [wachter2017counterfactual, Ustun2019ActionableRI, karimi2019model, FACE, van2019interpretable].¹¹1Note that the terms counterfactual explanations [wachter2017counterfactual], contrastive explanations [karimi2020survey], and recourse [Ustun2019ActionableRI] are used interchangeably in prior literature. Counterfactual/contrastive explanations serve as a means to provide recourse to individuals with unfavorable algorithmic decisions. We use these terms interchangeably to refer to the notion introduced and defined by [wachter2017counterfactual]. For instance, Wachter et al. [wachter2017counterfactual] proposed a gradient based approach which finds the nearest counterfactual resulting in the desired prediction. Ustun et al. [Ustun2019ActionableRI] proposed an integer programming based approach to obtain actionable recourses for linear classifiers. More recently, Karimi et al. [karimi2021algorithmic] advocated for incorporating the causal structure of the underlying data when generating recourses.

Complementarily, data protection laws provide users with greater authority over their personal data. For instance, users are granted the right to withdraw consent to the usage of their data at any time [finckreviving]. These regulations affect technology platforms that train their ML models on personal data of users under the respective legal regime. Legal scholars have argued that the continued use of machine learning models relying on deleted data instances could be deemed illegal [ginart2019making, villaronga2018humans].

Irrespective of the underlying mandate, data deletion has raised a number of algorithmic research questions. In particular, recent literature has focused on the efficiency of deletion (i.e., how to delete individual data points without retraining the model [wu2020deltagrad, ginart2019making, izzo2021, Golatkar_2020_CVPR, golatkar2020forget]) and model accuracy aspects of data deletion (i.e., how to remove data without compromising the accuracy of models [biega2020dm, goldsteen2021data, shanmugam2021learning]). An aspect of data deletion which has not been examined before is whether and how data deletion may impact model explanation frameworks. Thus, there is a need to understand and systematically characterize the limitations of recourse algorithms when personal user data may need to be deleted from trained ML models. Indeed, deletion of certain data instances might invalidate actionable model explanations – both for the deleting user and, critically, unsuspecting other users. Such invalidations can be especially problematic in cases where users have already started to take often costly actions to change their model outcomes based on previously received explanations.

(a) Recourse Outcome Instability

(b) Recourse Action Instability

Figure 1: Visualizing the two key settings we consider in this work. In figure 0(a), the recourse

{~ x}_{1}

for an input

x

is invalidated due to a model update. In figure 0(b), the recourse is additionally recomputed (i.e.,

{~ x}_{ω}

) to avoid recourse invalidation.

In this paper, we formally examine the problem of algorithmic recourse in the context of data deletion requests from two perspectives. First, we consider a scenario where an individual has received a recourse, and afterwards the model needs to be updated since a small set of individuals has decided to withdraw their data. Hence, we ask: What is the worst impact that a deleted data instance can have on the recourse validity, i.e., will the outdated recourse still lead to a positive prediction? This setup is depicted in Fig. 0(a). We will refer to this as Recourse Outcome Instability.

Second, we again assume that an individual has received a recourse, and the model will be updated due to data deletion requests. In this case, the recourse action would need to be updated as a consequence of the model update. Hence, we ask: What is the worst impact that a deleted data instance can have on the updated recourse action? What maximal change in recourse will be required to maintain a positive prediction? The change in the recourse action is shown in Fig. 0(b) (black arrow) and will be termed Recourse Action Instability.

Given these measures, we derive and analyze theoretical worst-case guarantees of the maximal instability induced for linear models and neural networks in the overparameterized regime. We furthermore define an optimization problem for empirically quantifying recourse instability under data deletion. For a given trained machine learning model, we are interested in finding the smallest set of data points the deletion of which maximizes the proposed instability measures. Since the resulting brute-force approach (i.e., retraining models for every possible removal set) is intractable, we propose a relaxation for recourse instability maximization, that can be optimized using end-to-end gradient descent or via a greedy approximation algorithm.

To summarize, in this work we make the following key contributions: First, we formulate the problem of recourse invalidation under the right to be forgotten by defining two new instability measures. Second, through rigorous theoretical analysis, we determine conditions under which recourses output by state-of-the-art methods are invalidated through data deletion requests from users whose data is part of the training set. Third, using our instability measures, we present an optimization framework for effectively evaluating the vulnerability of algorithmic recourse to data deletion requests. Finally, we conduct extensive experiments on multiple real-world data sets across both regression and classification tasks, showing that the removal of even one point from the training set can lead to invalidation of up to 95 percent of all recourses.

2 Related Work

Algorithmic Approaches to Recourse. As discussed earlier, several approaches have been proposed in literature to provide recourse to individuals who have been negatively impacted by model predictions [tolomei2017interpretable, laugel2017inverse, Dhurandhar2018, wachter2017counterfactual, Ustun2019ActionableRI, joshi2019towards, van2019interpretable, pawelczyk2019, mahajan2019preserving, mothilal2020fat, karimi2019model, rawal2020interpretable, karimi2020probabilistic, dandl2020multi, antoran2020getting, spooner2021counterfactual, albini2022]. These approaches can be roughly categorized along the following dimensions [verma2020counterfactual]: type of the underlying predictive model (e.g., tree based vs. differentiable classifier), type of access they require to the underlying predictive model (e.g., black box vs. gradients), whether they encourage sparsity in counterfactuals (i.e., only a small number of features should be changed), and whether the underlying task is posed as a regression or classification problem. The aforementioned approaches generate recourses assuming a static environment without data deletion requests, where both the predictive model and the recourse remain stable. A related line of work has also focused on determining the extent to which recourses remain invariant to the choice of the underlying model [pawelczyk2020multiplicity, black2021consistent], changes due to data distribution shifts [rawal2021modelshifts, upadhyay2021robust], small perturbations to the input instances [artelt2021evaluating, dominguezolmedo2021adversarial, slack2021counterfactual], or small perturbation to the recourses [pawelczyk2022noisy]. The works by [pawelczyk2020multiplicity, rawal2021modelshifts, artelt2021evaluating, dominguezolmedo2021adversarial, pawelczyk2022noisy] provide theoretical analyses of (a) the extra cost associated with algorithmic recourse under model multiplicity, and (b) the setting in which the input instance for which recourse is being computed may itself be noisy.

Our work, in contrast, addresses the problem of recourse in the presence of data deletion requests. While we do not propose a new recourse method, we suggest effective algorithms to delete a minimal subset of training points to invalidate a large fraction of recourses output by a given method.

Sample Deletion in Predictive Models. Since EU’s GDPR requires that individuals can request to have their data deleted, several approaches in recent literature have been focusing on updating a machine learning model without the need of retraining the entire model from scratch [wu2020deltagrad, ginart2019making, izzo2021, Golatkar_2020_CVPR, golatkar2020forget, cawley2004fast]. A related line of work considers the problem of data valuation [ghorbani2020distributional, ghorbani19shapley]. Finally, removing subsets of training data is an essential ingredient used for model debugging [doshi2017rigorous] or the evaluation of explanation techniques [hooker2019bench, rong2022evaluating].

While prior research has focused on effective data removal strategies for preditive models [wu2020deltagrad, ginart2019making, izzo2021, Golatkar_2020_CVPR, golatkar2020forget, neel2021deletion], the robustness of state-of-the-art recourse methods to distribution shifts and random perturbations [upadhyay2021robust, rawal2021modelshifts, dominguezolmedo2021adversarial, pawelczyk2022noisy], there is no prior work that studies to what extent recourses output by state-of-the-art methods are affected by data deletion requests. Our work is the first to tackle this important problem.

3 Preliminaries

3.1 The Predictive Model and the Data Deletion Mechanism

We consider prediction problems from some input space $R^{d}$ to an output space $Y$ , where $d$ is the number of input dimensions. We denote a sample by $z = (x, y)$ , and denote the training data set by $D = {z_{1}, \dots, z_{n}}$ . Next, consider the weighted empirical risk minimization problem (ERM), which gives rise to the optimal model parameters:

w (ω) = a r g m i n w^{'} n \sum i = 1 ω_{i} \cdot ℓ (y_{i}, f_{w^{'}} (x_{i})),

(1)

where $ℓ (\cdot, \cdot)$ is an instance-wise loss function (e.g., binary cross-entropy, mean-squared-error (MSE) loss, etc.) and $ω \in {0, 1}^{n}$ are data weights that are fixed at training time. If $ω_{i} = 1$ , then the point $z_{i} = (x_{i}, y_{i})$ is part of the training data set, otherwise it is not. During model training, we set $ω_{i} = 1 \forall i$ , that is, the decision maker uses all available training instances at training time. In the optimization expressed in (1), the model parameters $w$ are usually an implicit function of the data weight vector $ω$ and we write $w (ω)$ to highlight this fact; in particular, when all training instances are used we write $w (1)$ , where $1 \in R^{n}$ is a vector of 1s. In summary, we have introduced the weighted ERM problem since it allows us to understand the impact of arbitrary data deletion patterns on actionable explanations as we allow users to withdraw their entire input $z_{i} = (y_{i}, x_{i})$ from the training set used to train the model $f_{w (1)}$ . Next, we present the recourse model we consider.

3.2 The Recourse Problem in the Context of the Data Deletion Mechanism

We follow an established definition of counterfactual explanations originally proposed by [wachter2017counterfactual]. For a given model $f_{w (ω)} : R^{d} \to R$ parameterized by $w$ and a distance function $d (\cdot, \cdot) : X \times X \to R_{+}$ , the problem of finding a counterfactual explanation $ˇ x = x + δ$ for a factual instance $x$ is given by:

δ_{ω, x} \in a r g m i n δ^{'} \in A_{d} (f_{w (ω)} (x + δ^{'}) - s)^{2} + λ \cdot d (x, x + δ^{'}),

(2)

where $λ \geq 0$ is a scalar tradeoff parameter and $s$ denotes the target score. In the optimization expressed in (2), the optimal recourse action $δ$ usually depends on the model parameters and since the model parameters themselves depend on the data weights we write $δ_{ω, x}$ to highlight this fact. Moreover, a specific recourse implementation $δ (w (ω), x) = δ_{ω, x}$ is constrained to return a recourse as specified by equation (2). The first term in the objective on the right-hand-side of (2) encourages the outcome $f_{w (ω)} (ˇ x)$ to become close to the user-defined target score $s$ , while the second term ensures that the distance between the factual instance $x$ and the recourse ${ˇ x}_{ω} := x + δ_{ω, x}$ is low. The set of constraints $A_{d}$ ensures that only admissible changes are made to the factual $x$ , e.g., $A_{d}$ could specify that no changes to a protected feature such as ‘race’ can be made.

3.3 Definitions of Recourse Robustness Through the Lens of the Right-to-be-Forgotten

One of the key goals of this work is to understand if and when recourses suggested by state-of-the-art methods are invalidated upon data deletion requests when the model needs to be updated as a consequence. We first introduce several key terms, namely, prescribed recourses and recourse outcomes. A prescribed recourse refers to a recourse that was provided to an end user by a recourse method (e.g., increase salary by $500). The recourse outcome is the model’s prediction evaluated at the recourse. With these concepts in place, we develop two recourse instability definitions.

Definition 1.

(Recourse outcome instability) The recourse outcome instability with respect to a factual instance $x$ , where at least one data weight is set to $0$ , is defined as follows:

Δ (ω) = ∣ ∣ f_{w (1)} ({ˇ x}_{1}) - f_{w (ω)} ({ˇ x}_{1}) ∣ ∣,

(3)

where $f_{w (1)} ({ˇ x}_{1})$ is the prediction at the prescribed recourse ${ˇ x}_{1}$ based on the model that uses the full training set (i.e., $f_{w (1)}$ ) and $f_{w (ω)} ({ˇ x}_{1})$ is the prediction at the prescribed recourse for an updated model and data deletion requests have been incorporated into the predictive model (i.e., $f_{w (ω)}$ ).

The above definition concisely describes the effect of applying “outdated” recourses to the updated model. We assume that only the model parameters are being updated while the prescribed recourses remain unchanged. For a discrete model with $Y = {0, 1}$ , Definition 1 captures to what extent the prescribed recourses will still be valid after deletion of training instances. To capture the invalidation rates of recourses for a continuous-score model with target value $s$ , we can apply Definition 1 with a discretized $f^{'} (x) = I [f (x) > s]$ , where $I$ denotes the indicator function.

In Definition 2 below we allow the recourse to change due to model parameter updates and specify the distance function $d$ to be a p-norm which is consistent with related work (e.g., [karimi2019model, wachter2017counterfactual]).

Definition 2.

(Recourse action instability) The Recourse action instability with respect to a factual input $x$ , where at least one data weight is set to $0$ , is defined as follows:

Φ_{p} (ω) = ∥ ∥ {ˇ x}_{1} - {ˇ x}_{ω} {∥ ∥}_{p},

(4)

where $p \in [1, \infty)$ , and ${ˇ x}_{ω}$ is the recourse obtained for the model trained on the data instances that remain present in the data set after the deletion request.

Definition 2 quantifies the extent to which the prescribed recourses after data deletion requests (i.e., ${ˇ x}_{ω}$ ) would have to additionally change to still achieve the desired recourse outcome (e.g., loan approval). Using our invalidation measures defined above, in the next section, we formally study the trade-offs between actionable explanations and the right-to-data deletion. To do so we provide data dependent upper bounds on the invalidation measures from Definitions 1 and 2, which practitioners can use to probe the worst-case vulnerability of algorithmic recourse to data deletion requests.

4 Analysing the Trade-off between Algorithmic Recourse and the Right to be Forgotten

Here we relate the two notions of recourse instability presented in Definitions 1 and 2 to the vulnerability of the underlying predictive model with respect to data deletion requests. Studying the relation between deletion requests and the robustness of algorithmic recourse for models as complex as neural networks requires recent results from computational learning theory. In particular, we rely on insights on the behaviour of over-parameterized neural networks from the theory of Neural Tangent Kernels, which we will now briefly introduce.

A brief introduction to NTK Theory. We study these robustness notions by considering both linear and neural network models in the overparameterized regime with ReLU activation functions that take the following form:

f_{ANN} (x) = \frac{1}{\sqrt{k}} k \sum j = 1 a_{j} σ (w_{j}^{⊤} x),

(5)

where $W = [w_{1}, \dots, w_{k}] \in R^{d \times k}$ and $a = [a_{1}, \dots, a_{k}] \in R^{k}$ . To concretely study the impact of data deletion on recourses in non-linear models such as neural networks, we leverage ideas from the neural tangent kernel (NTK) literature [jacot2018neural, lee2019wide, arora2019exact, du2018gradient]. The key insight from this literature for the purpose of our work is that infinitely wide neural networks can be expressed as a kernel ridge regression problem with the NTK under appropriate parameter initialization, and gradient descent training dynamics. In particular, in the limit as the number of hidden nodes $k \to \infty$ , the neural tangent kernel associated with a two-layer ReLU network has a closed-form expression [chen2021deep, zhang2021rethinking] (see Appendix A.4 for a derivation):

K^{\infty} (x_{0}, x) = \frac{x_{0}^{⊤} x (π - arcos (\frac{x_{0}^{⊤} x}{∥ x_{0} ∥ ∥ x ∥}))}{2 π} .

(6)

Thus, the network’s prediction at an input $x$ can be described by:

f_{NTK} (x)

= (K^{\infty} (x, X))^{⊤} w_{NTK},

(7)

where $X \in R^{n \times d}$ is the input data matrix, $K^{\infty} (X, X) \in R^{n \times n}$ is the NTK matrix evaluated on the training data points: $[K^{\infty} (X, X)]_{i j} = K^{\infty} (x_{i}, x_{j})$ and $w_{NTK} = (K^{\infty} (X, X) + β I_{n})^{- 1} Y$ solves the $ℓ_{2}$ regularized empirical risk minimization problem with MSE loss where $Y \in R^{n}$ is the vector of prediction targets.

Analyzing Our Instability Measures. With the basic terminology in place, we provide upper bounds for the recourse outcome instability of linear and overparameterized neural network models.

Proposition 1 (Upper bound on recourse outcome instability for linear models).

For the linear regression model $f (x) = w_{L}^{⊤} x$ with model parameters $w_{L} = (X^{⊤} X)^{- 1} X^{⊤} Y$ , an upper bound for the recourse invalidation from Definition 1 by removing an instance from the training set is given by:

Δ \leq ∥ {ˇ x}_{1} ∥_{2} \cdot max i \in [n] ∥ d_{i}^{L} ∥_{2},

(8)

where $d_{i}^{L} = (X^{⊤} X)^{- 1} x_{i} \cdot \frac{r_{i}}{1 - h_{i i}}$ , $r_{i} = y_{i} - w_{L}^{⊤} x_{i}$ and $h_{i i} = x_{i}^{⊤} (X^{⊤} X)^{- 1} x_{i}$ .

Intuitively, $d_{i}^{L}$ represents the importance that the training point $(x_{i}, y_{i})$ has on the model parameters $w_{L}$ . It measures how much the model parameters $w$ are affected by deleting the $i$ -th instance from the model. The term $d_{i}^{L}$ is also known as the empirical influence function in the statistics literature [cook1980characterizations]. The term $(X^{⊤} X)^{- 1} x_{i} = \frac{d w_{L}}{} d y_{i}$ describes how sensitive the model parameters are to $y_{i}$ , while the residual $r_{i}$ captures how well $y_{i}$ can be fit by the model. On the contrary, the term $h_{i i}$ from the denominator is known as the leverage and describes how atypical $x_{i}$ is with respect to all training inputs $X$ . In summary, data instances that have influential labels, are atypical, and cannot be well fit by the model will have the highest impact when deleted.

Proposition 2 (Upper bound on recourse outcome instability for wide neural networks).

For the NTK model with $w_{NTK} = (K^{\infty} (X, X) + β I_{n})^{- 1} Y$ , an upper bound for the recourse invalidation from Definition 1 by removing an instance $(x, y)$ from the training set is given by:

Δ \leq ∥ K^{\infty} ({ˇ x}_{1}, X) ∥_{2} \cdot max i \in [n] ∥ d_{i}^{NTK} ∥_{2},

(9)

where $d_{i}^{NTK} = \frac{1}{k_{i i}} k_{i} k_{i}^{⊤} Y$ , where $k_{i}$ is the $i$ -th column of the matrix $(K^{\infty} (X, X) + β I_{n})^{- 1}$ , and $k_{i i}$ is its $i$ -th diagonal element.

Intuitively, $d_{i}^{NTK}$ is the linear model analog to $d_{i}^{L}$ and $d_{i}^{NTK}$ represents the importance that the point $z_{i} = (x_{i}, y_{i})$ has on the model parameters $w_{NTK}$ . Next, we provide a generic upper bound on recourse action instability.

Proposition 3 (Upper bound on recourse action instability).

For any predictive model with scoring function $f : R^{d} \to R$ , an upper bound for the recourse instability from Definition 2 by removing an instance $z_{i} = (x, y)$ from the training set is given by:

Φ

\leq ∥ d_{i} ∥_{2} \int_{0}^{1} ∥ ∥ ∥ \frac{D δ}{D w} (~ w) {∥ ∥ ∥}_{2} d γ,

(10)

where $\frac{D δ}{D w}$ denotes the Jacobian of optimal recourse with the corresponding operator matrix norm, $~w\coloneqqγw+(1−γ)w−i$ with $w_{- i}$ being the optimal model parameters with the i-th training instance removed from the training set, and $∥ w - w_{- i} ∥_{2} = ∥ d_{i} ∥_{2}$ .

The norm of the Jacobian of optimal recourse indicates the local sensitivity of optimal recourse with respect to changes in model parameters $w$ . High magnitudes indicate that a small change in the parameters may require a fundamentally different recourse action. The total change can be bounded by the integral over these local sensitivities, which means that low local sensitivities along the path will result in a low overall change. Next, we specialize this result to the case of linear models.

Corollary 1 (Upper bound on recourse action instability for linear models).

For the linear regression model $f (x) = w_{L}^{⊤} x$ with model parameters given by $w_{L} = (X^{⊤} X)^{- 1} X^{⊤} Y$ , an upper bound for the recourse action instability in the setting $s = 0, λ = 0$ by removing an instance from the training set is given by:

(11)

under the condition that $w_{L}^{⊤} w_{L, - i} \geq 0$ (no diametrical weight changes), where $w_{L, - i} = w_{L} - d_{i}^{L}$ is the weight after removal of training instance $i$ and $d_{i}^{L} = (X^{T} X)^{- 1} x_{i} \frac{(y_{i} - w_{L}^{⊤} x_{i})}{1 - h_{i i}}$ .

For models trained on large data sets, the absolute value of the model parameters’ norm $∥ w_{L} ∥$ will not change much under deletion of a single instance. Therefore we argue that the denominator $min (∥ w_{L} ∥_{2}, {min}_{i \in [n]} ∥ w_{L, - i} ∥_{2}) \approx ∥ w_{L} ∥$ . Thus, recourse action instability is mainly determined by the sensitivity of model parameters to deletion, ${max}_{i \in [n]} ∥ d_{i}^{L} ∥_{2}$ , scaled by the ratio of $\frac{∥ x ∥_{2}}{∥ w_{L} ∥_{2}}$ .

In practical use-cases, when trying to comply with both the right to data deletion and the right to actionable explanations, our results have several key implications. Instances with high influence captured by $d_{i}$ should be encountered with caution during model training in order to provide reliable recourses to the individuals who seek recourse. In summary, our results suggest that the right to data deletion may be fundamentally at odds with reliable state-of-the-art actionable explanations as the removal of an influential data instance can induce a large change in the recourse robustness, the extent to which is primarily measured by the empirical influence function $d_{i}$ .

5 Finding the Set of Most Influential Data Points

In this section, we present optimization procedures that can be readily used to assess recourses’ vulnerability to deletion requests. On this way, we start by formulating our optimization objective. We denote by $m \in {Δ, Φ_{p}}$ the measure we would like to optimize for. Our goal is to find the smallest number of deletion requests that leads to a maximum impact on the instability measure $m$ . To formalize this objective, we define the set of possible data weight configurations:

\begin{matrix} Γ_{α} := {ω : % Maximally ⌊ α \cdot n ⌋ entries of ω % are 0 and the remainder is 1.} . \end{matrix}

(12)

In (12), the parameter $α$ controls the fraction of instances that are being removed from the training set. For a fixed fraction $α$ , our problem of interest becomes:

ω^{*} = a r g m a x ω \in Γ_{α} m (ω) .

(13)

Note however that the dependency of the metric $m$ on the data weights $ω$ has no closed form expression in general. Instead, the relations are defined implicitly as solutions to several, non-linear optimization problems. Explicitly, we have $Δ (ω) = Δ (f (w (ω), ˇ x))$ for the recourse outcome instability and $Φ_{p} (ω) = Φ_{p} (δ (w (ω), x)))$ . In the following paragraphs, we will briefly discuss how we compute each of these functions.

Model parameters from data weights $w (ω) .$ For the linear model, an analytical solution can be obtained, $w_{L} (ω) = (X^{⊤} Ω X)^{- 1} X^{⊤} Ω Y$ , where $Ω = diag (ω)$ . The same goes for the NTK model where [busuttil2007weighted, Eqn. 3]. When no closed-form expressions for the model parameters exist, we can resort to the infinitesimal jackknife (IJ) [jaeckel1972infinitesimal, efron1982jackknife, giordano2019swiss, giordano2019highswiss], that can be seen as a linear approximation to this implicit function. We refer to Appendix C for additional details on this matter.

Model prediction from model parameters $f (w, ˇ x) .$ Having established the model parameters, evaluating the prediction at a given point is simple and can be done in a differentiable manner for the models we consider in this work.

Recourse action from model parameters $δ (w, ˇ x) .$ Estimating the recourse action is more challenging as it requires solving Eqn. (2). However a differentiable solution exists for linear models, where the optimal recourse action is given by $δ_{L} = \frac{s - w_{L} (ω)^{⊤} x}{λ + ∥ w_{L} (ω) ∥_{2}^{2}} w_{L} (ω)$ . When the underlying predictor is a wide neural network we can approximate the recourse expression of the corresponding NTK, $δ_{NTK} \approx \frac{s - f_{ω, NTK} (x)}{} λ + ∥ {¯ w}_{NTK} (¯ ω) ∥_{2}^{2} {¯ w}_{NTK} (ω)$ , which stems from the first-order taylor expansion $f_{ω, NTK} (x + δ) \approx f_{ω, % N T K} (x) + δ^{⊤} {¯ w}_{NTK} (ω)$ with ${¯ w}_{NTK} (ω) = \nabla_{x} K (x, X) w_{NTK} (ω)$ .

The Optimal Yet Intractable Approach. Having established a way to compute $m (ω)$ , we need to optimize the discrete objective in (13). The brute-force solution would use the following approach: First, choose a data subset that deletes maximally $100 α$ percent of the data instances from the training set. Second, for each such subset, retrain the model. Finally, choose to remove the data points where the removal leads to the highest instability. To illustrate why this method is intractable, let $r = ⌊ 100 α ⌋$ . In this setting, we have to retrain the predictive model $(\frac{n}{r})$ different times to find the point set.

Two tractable optimization schemes. We propose two relaxations to tackle this problem. First, we resort to a greedy algorithm. Therefore, we consider the model on the full data set and compute the objective function $m (ω)$ under deletion of every instance (alone). We then select the instance that leads to the highest increase in the objective. We add this instance to the set of deleted points. Subsequently, we refit the model and compute the impact of deletion for every second instance, when deleted in combination with the first one. Again, we add the instance that results in the largest increase to the set. Iteratively repeating these steps, we identify more instances to be deleted.

Since this approach is still computationally intensive, we also propose a gradient-based optimization framework. We consider the relaxation of the problem in (13),

ω^{*} = a r g m a x ω \in {0, 1}^{n} m (ω) - ∥ 1 - ω ∥_{0},

(14)

where the $ℓ_{0}$ norm encourages to change as few data weights from $1$ to $0$ as possible while few removals of training instances should have maximum impact on the robustness measure. To find a binary and sparse solution, we use a recently suggested stochastic surrogate loss for the $ℓ_{0}$ term [yamada20a]. Using this technique, this objective can optimized using stochastic Gradient Descent (SGD). We refer the reader to Appendix C for more details and pseudocode of the two algorithms.

6 Experimental Evaluation

We experimentally evaluate our framework in terms of its ability to find significant recourse invalidations using the instability measures presented in Section 3.

6.1 Experimental Setup

Data Sets. For our real world experiments on regression tasks we use two real-world data sets. First, we use law school data from the Law School Admission Council (Admission) used in [kusner2017counterfactual, wachter2017counterfactual]. The council carried out surveys across 163 law schools in the US, in which they collected information from 21,790 law students across the US [wightman1998lsac]. The data contains information on the students’ entrance exams scores, their grade-point average before entering law school, and whether the student passed or failed their first bar-examination. The task is to predict the students’ first-year law-school average grades. Second, we use the Home Equity Line of Credit (Heloc) data set. Here, the target variable is a score indicating whether individuals will repay the Heloc account within a fixed time window. Across both tasks we consider those individuals in need of recourse if their scores lies below the median score across the entire data set.

Recourse Methods. We apply our techniques to four different methods which aim to generate low-cost recourses using different principles: SCFE was suggested by Wachter et al. [wachter2017counterfactual] and uses a gradient-based objective to find recourses, DICE [mothilal2020fat] uses a gradient-based objective to find recourses subject to a diversity constraint, and both REVISE [joshi2019towards] and CEM [Dhurandhar2018] use a generative model to encourage recourses to lie on the data manifold. In the main text, we report the results for CEM only. For all methods, we used the recourse method implementations from the CARLA library [pawelczyk2021carla] and specify the $ℓ_{1}$ cost constraint.

Prediction Models. For all data sets, we trained different predictive models: (a) Linear Regression models and (b) kernel models using the NTK kernel regression. Furthermore, we provide results for classification with (c) Logistic Regression, (d) Kernel-SVM and (e) ANN on two different data sets as well as additional details in Appendix B. All recourses were generated with respect to these models.

6.2 Evaluating our Data Deletion Framework

Evaluation Measures. For the purpose of our evaluation, we use both the recourse outcome instability measure and the recourse action instability measure presented in Definitions 1 and 2. We evaluate the efficacy of our framework to destabilize a large fraction of recourses using a small number of deletion requests (up to 14). To find influential instances, we use the greedy and the gradient-based algorithms described in Sec. 5. After having established a set of influential points. We recompute the metrics with the refitted models and recourses to obtain an ground truth result.

For the recourse outcome instability, our metric $Δ$ counts the number of recourses invalidated. We use the median as the target score $s$ , i.e., if the recourse outcome flips back from a positive leaning prediction (above median) to a negative one (below median) it is considered invalidated. When evaluating recourse action instability, we also identify a set of influential points, delete these points from the training set and refit the predictive model. However in this case, we additionally have to recompute the recourses to evaluate $Φ_{p}$ . We then measure the recourse instability using Definition 2 with $p = 2$ . Additionally, we compare our framework to a random baseline, which deletes points uniformly at random from the training set. We compute these measures for all individuals from the test set who require algorithmic recourse. To obtain standard errors, we split the test set into 5 folds and report averaged results over these 5 folds.

Results. In Figure 2, we measure the tradeoff between recourse outcome instability and the number of deletion requests. We plot the number of deletion requests against the fraction of all recourses that become invalidated when up to $k \in {1, \dots, 14}$ training points are removed from the training set of the predictive model. When the underlying model is linear, we observe that the removal of as few as 5 training points induces invalidation rates of all recourses that are as high as 95 $%$ percent – we observe a similar trend across all recourse methods. Note that a similar trend is present for the NTK model; however, a larger number of deletion requests (roughly 9) is required to achieve similar invalidation rates. Finally, also note that our approach is always much more effective at deleting instances than the random baseline. In Figure 3, we measure the tradeoff between recourse action instability and the number of deletion requests with respect to the SCFE recourse method when the underlying predictive model is linear or an NTK model. For this complex objective, we use the more efficient SGD optimization. Again, we observe that our optimization method significantly outperforms the random baselines at finding the most influential points to be removed.

7 Conclusion

In this work, we made the first step towards understanding the tradeoffs between actionable model explanations and the right to be forgotten. We theoretically analyzed the robustness of state-of-the-art recourse methods under data deletion requests, and suggested two algorithms (greedy as well gradient-based) to efficiently identify a small subset of individuals, whose data, when removed, would lead to invalidation of a large number of recourses for unsuspecting other users. Our experimental evaluation with multiple real-world data sets on both regression and classification tasks demonstrates that the right to be forgotten presents a significant challenge to the reliability of actionable model explanations.

Furthermore, our findings raise compelling questions on the deployment of counterfactual explanations in practice. First of all, Are the two requirements of actionable explanations and the right to be forgotten fundamentally at odds with one another? The theoretical and empirical results in this work indicate that for many model and recourse method pairs, this might indeed be the case. This finding leads to the pressing follow-up question: How can practitioners make sure that their recourses stay valid under deletion requests? A first take might be to implement the principle of Data Minimization [finckreviving, biega2020dm, shanmugam2021learning] in the first place, i.e, use only a minimal amount of data that is necessary to train a model of a satisfactory performance. In this case, most deletion requests would go unheeded, as the data might not be part of the trained ML models.

Additionally, our theoretical results suggest that the robustness to deletion increases when the model’s parameter changes under data deletion, measured by $∥ d_{i} ∥$ , remain small. This formulation closely resembles the definition of Differential Privacy (DP) [dwork2006calibrating, chaudhuri2011differentially, dwork2014algorithmic]. We therefore conjecture that the reliability of actionable recourse could benefit from models that have been trained under DP constraints, independently of the specific recourse method deployed (recall Proposition 3). As the field of AI rapidly evolves, data protection authorities will further refine the precise interpretations of general principles in regulations such as GDPR. The present paper contributes towards this goal algorithmically and empirically by providing evidence of tensions between different data protection principles.

Acknowledgments

MP would like to thank Jason Long, Emanuele Albini, Jiāháo Chén, Danial Dervovic and Daniele Magazzeni for insightful discussions at early stages of this work.

References

Appendix

Appendix A Theoretical Results

a.1 Upper Bounds on Recourse Outcome Instability

Proposition 1 (Upper Bound on Output Robustness for Linear Models).

For the linear regression model $f (x) = w^{⊤} x$ with weights given by $w = (X^{⊤} X)^{- 1} X^{⊤} Y$ , an upper bound for the output robustness by removing an instance $(x, y)$ from the training set is given by:

Δ \leq max i \in [n] ∥ d_{i} ∥_{2} \cdot ∥ {ˇ x}_{1} ∥_{2},

(15)

where $d_{i} = (X^{⊤} X)^{- 1} x_{i} \cdot \frac{r_{i}}{1 - h_{i i}}$ , $r_{i} = y_{i} - w^{⊤} x_{i}$ and $h_{i i} = x_{i}^{⊤} (X^{⊤} X)^{- 1} x_{i}$ .

Proof.

By Definition 1, we have:

$Δ$	$= ∣ ∣ w_{1}^{⊤} {ˇ x}_{1} - w_{- i}^{⊤} {ˇ x}_{1} ∣ ∣$		(16)
	$= ∣ ∣ (w_{1} - w_{- i})^{⊤} {ˇ x}_{1} ∣ ∣$
	$= ∣ ∣ ∣ ((X^{T} X)^{- 1} x_{i} \frac{(y_{i} - w^{T} x_{i})}{1 - h_{i i}})^{⊤} {ˇ x}_{1} ∣ ∣ ∣$	$(by Theorem ???)$	(17)
	$\leq ∥ d_{i} ∥_{2} \cdot ∥ {ˇ x}_{1} ∥_{2}$	$(by Cauchy-Schwartz)$	(18)
	$\leq ∥ {ˇ x}_{1} ∥_{2} \cdot max i \in [n] ∥ d_{i} ∥_{2} \cdot,$		(19)

where $d_{i} = (X^{T} X)^{- 1} x_{i} \frac{(y_{i} - w^{T} x_{i})}{1 - h_{i i}}$ . This completes our proof. ∎

Proposition 2 (Upper Bound on Output Robustness for NTK).

For the NTK model with $w_{NTK} = (K^{\infty} (X, X) + λ I_{n})^{- 1} Y$ , an upper bound for the output robustness by removing an instance $(x, y)$ from the training set is given by:

Δ \leq ∥ K^{\infty} ({ˇ x}_{1}, X) ∥_{2} \cdot max i \in [n] ∥ d_{i} ∥_{2},

(20)

where $d_{i} = \frac{1}{k_{i i}} k_{i} k_{i}^{⊤} Y$ , where $k_{i}$ is the $i$ -th column of the matrix $(K^{\infty} (X, X) + β I_{n})^{- 1}$ , and $k_{i i}$ is its $i$ -th diagonal element.

Proof.

By Definition 1, and the assumption of the over-parameterized regime, we have:

$Δ$	$= ∣ ∣ f_{NTK} ({ˇ x}_{1}) - f_{NTK}^{- i} ({ˇ x}_{1}) ∣ ∣$
	$= ∣ ∣ (K^{\infty} ({ˇ x}_{1}, X))^{⊤} w_{NTK} - (K^{\infty} ({ˇ x}_{1}, X))^{⊤} ((K^{\infty} (X, X) + β I_{n})^{- 1} - \frac{1}{k_{i i}} k_{i} k_{i}^{⊤}) Y ∣ ∣$	(21)
	$= ∣ ∣ (K^{\infty} ({ˇ x}_{1}, X))^{⊤} \frac{1}{k_{i i}} k_{i} k_{i}^{⊤} Y ∣ ∣$	(22)
	$\leq ∥ d_{i} ∥_{2} \cdot ∥ K^{\infty} ({ˇ x}_{1}, X) ∥_{2} (by Cauchy-% Schwartz)$
	$\leq ∥ {ˇ x}_{1} ∥_{2} \cdot max i \in [n] ∥ d_{i} ∥_{2},$	(23)

where $d_{i} = \frac{1}{k_{i i}} k_{i} k_{i}^{⊤} Y$ which completes our proof. ∎

a.2 Upper Bounds on Recourse Action Instability

Proposition 3 (Upper Bound on Input Robustness).

For the linear regression model $f (x) = w^{⊤} x$ with weights given by $w = (X^{⊤} X)^{- 1} X^{⊤} Y$ , an upper bound for the input robustness in the setting $s = 0, λ = 0$ by removing the $i$ -th instance $(x_{i}, y_{i})$ from the training set is given by:

Φ \leq ∥ d_{i} ∥_{2} \frac{4 \sqrt{2} ∥ x ∥_{2}}{min (∥ w ∥_{2}, ∥ w_{- i} ∥_{2})},

(24)

under the condition that $w^{⊤} w_{- i} \leq 0$ (no diametrical weight changes), where $w_{- i} = w - d_{i}$ is the weight after removal of training instance $i$ and $d_{i} = (X^{T} X)^{- 1} x_{i} \frac{(y_{i} - w^{⊤} x_{i})}{1 - h_{i i}}$ .

Proof.

For a linear scoring function $f (x) = w^{' ⊤} x$ with given parameters $w^{'}$ , under the squared $ℓ_{2}$ norm constraint with balance parameter $λ$ , the optimal recourse action is given by [pawelczyk2021connections]:

δ (w^{'}) = \frac{s - w^{' ⊤} x}{∥ w^{'} ∥_{2}^{2} + λ} \cdot w^{'} .

(25)

Using Definition 2, we can express the total change in $δ$ as a path integral over changes in $w$ , times the change $\frac{D δ}{D w}$ they entail:

	$Φ$			(26)
				(27)

where $\frac{D δ}{D w}$ denotes the Jacobian, with the corresponding operator matrix norm. Defining $~w\coloneqqγw+(1−γ)w−i$ and using $∥ w - w_{- i} ∥_{2} = ∥ d_{i} ∥_{2}$ , we obtain

Φ

\leq ∥ d_{i} ∥_{2} \int_{0}^{1} ∥ ∥ ∥ \frac{D δ}{D w} (~ w) {∥ ∥ ∥}_{2} d γ .

(28)

Because of the form $δ (w^{'}) = f (w^{'}) w^{'}$ , where $f(w′)\coloneqqs−w′⊤x∥w′∥22+λ$ is a scalar function, its Jacobian has the form $\frac{D δ}{D w^{'}} = w^{'} {(\nabla f (w^{'}))}^{⊤} + f (w^{'}) I$ . We will now derive a bound on the Jacobian’s operator norm:

	$∥ ∥ ∥ \frac{D δ}{D w^{'}} (~ w) {∥ ∥ ∥}_{2}$			(29)
		$\leq ∥ \nabla f (~ w) ∥_{2} ∥ ~ w ∥_{2} + \| f (~ w) \| .$		(30)

Additionally, we know that for $s = 0$ , $| f (w^{'}) | \leq \frac{∥ x ∥_{2} ∥ ~ w ∥_{2}}{∥ ~ w ∥_{2}^{2}} = \frac{∥ x ∥_{2}}{∥ ~ w ∥_{2}}$ . The gradient is given by

$∥ \nabla f (~ w) ∥_{2}$	$= ∥ ∥ ∥ \frac{- (∥ ~ w ∥_{2}^{2} + λ) x - 2 (s - {~ w}^{⊤} x) ~ w}{} (∥ ~ w ∥_{2}^{2} + λ)^{2} {∥ ∥ ∥}_{2}$		(31)
	$\leq \frac{(∥ ~ w ∥_{2}^{2} + λ) ∥ x ∥_{2} + 2 (s + ∥ ~ w ∥_{2} ∥ x ∥_{2}) ∥_{2} ~ w ∥_{2}}{∥ ~ w ∥_{2}^{4}}$		(32)
	$= \frac{3 ∥ x ∥_{2}}{∥ ~ w ∥_{2}^{2}}$	$(Using λ \to 0, s = 0) .$	(33)

In summary,

∥ ∥ ∥ \frac{D δ}{D w^{'}} (~ w) {∥ ∥ ∥}_{2} \leq \frac{3 ∥ x ∥_{2}}{∥ ~ w ∥_{2}^{2}} ∥ ~ w ∥_{2} + \frac{∥ x ∥_{2}}{∥ ~ w ∥_{2}} = \frac{4 ∥ x ∥_{2}}{∥ ~ w ∥_{2}} .

(34)

Because $~ w$ is a line between $w$ and $w_{- i}$ , its norm is bounded from below by $∥ ~ w ∥_{2} \geq \frac{1}{\sqrt{2}} min (∥ w ∥_{2}, ∥ w_{- i} ∥_{2}) \geq \frac{1}{\sqrt{2}} (∥ w ∥_{2} - ∥ w - w_{- i} ∥_{2}) = \frac{1}{\sqrt{2}} (∥ w ∥_{2} - ∥ d_{i} ∥_{2})$ . We can thus uniformly bound the integral and plug in the bound because of its positivity,

$Φ$	$\leq ∥ d_{i} ∥_{2} \int_{0}^{1} ∥ ∥ ∥ \frac{D δ}{D w} (~ w) {∥ ∥ ∥}_{2} d γ$	(35)
	$\leq ∥ d_{i} ∥_{2} \int_{0}^{1} \frac{4 \sqrt{2} ∥ x ∥_{2}}{min (∥ w ∥_{2}, ∥ w_{- i} ∥_{2})} d γ$	(36)
	$= ∥ d_{i} ∥_{2} \frac{4 \sqrt{2} ∥ x ∥_{2}}{min (∥ w ∥_{2}, ∥ w_{- i} ∥_{2})},$	(37)

which completes the proof. ∎

a.3 Auxiliary Theoretical Results

We state the following classic result by [miller1974unbalanced] without proof.

Theorem 1.

(Leave-One-Out Estimator, [miller1974unbalanced]) Define $(x_{i}, y_{i})$ as the point to be removed from the training set. Given the optimal weight vector $w = (X^{⊤} X)^{- 1} X^{⊤} Y$ which solves for a linear model under mean-squared-error loss, the leave-one-out estimator is given by:

w - w_{- i} = (X^{T} X)^{- 1} x_{i} \frac{(y_{i} - w^{T} x_{i})}{1 - x_{i}^{T} (X^{T} X)^{- 1} x_{i}} = (X^{T} X)^{- 1} x_{i} \frac{(y_{i} - w^{T} x_{i})}{1 - h_{i i}} =: d_{i} .

a.4 An Analytical NTK Kernel

In this section, we provide theoretical results that allow deriving the closed form solution of the NTK for the two-layer ReLU network. First, see the paper by Jacot et al. [jacot2018neural] for the original derivation of the neural tangent kernel.

A closed-form solution for two-layer ReLU networks. From [zhang2021rethinking, du2018gradient, Assumption 3.1] we obtain the definition of the Kernel matrix $K^{\infty}$ (termed Gram matrix in the paper [du2018gradient]) for ReLU networks:

	$K_{i j}^{\infty} = K^{\infty} (x_{i}, x_{j})$	$= E_{w \sim N (0, I)} [x_{i}^{⊤} x_{j} I {w^{⊤} x_{i} \geq 0, w^{⊤} x_{j} \geq 0}]$
		$= x_{i}^{⊤} x_{j} E_{w \sim N (0, I)} [I {w^{⊤} x_{i} \geq 0, w^{⊤} x_{j} \geq 0}]$
		$= x_{i}^{⊤} x_{j} \frac{π - arcos (\frac{x_{i}^{⊤} x_{j}}{∥ x_{i} ∥ ∥ x_{j} ∥})}{2 π} .$

The last reformulation uses an analytical result by [cho2009kernel]. The derived result matches the one by [xie2017diverse], which however does not provide a comprehensive derivation.

Appendix B Additional Experimental Results

Data sets for the Classification Tasks When considering classification tasks on the heloc and admission data sets, we threshold the scores based on the median to obtain binary target labels. On the Admission data set (in the classification setting), a counterfactual is found when the predicted first-year average score switches from ‘below median’ to ‘above median’. We then count an invalidation if, after the model update, the score of a counterfactual switches back to ‘below median’. In addition to the aforementioned data sets, we use both the Diabetes and the Compas data sets. The Diabetes data set which contains information on diabetic patients from 130 different US hospitals [strack2014impact]. The patients are described using administrative (e.g., length of stay) and medical records (e.g., test results), and the prediction task is concerned with identifying whether a patient will be readmitted within the next 30 days. We sub sampled a smaller data sets of 10000 points from this dataset. 8000 points are left to train the model, while 2000 points are left for the test set. The Compas data set [Angwin2016] contains data for more than 10,000 criminal defendants in Florida. It is used by the jurisdiction to score defendant’s likelihood of reoffending. We kept a small part of the raw data as features like name, id, casenumbers or date-time were dropped. The classification task consists of classifying an instance into high risk of recidivism. Across all data sets, we dropped duplicate instances.

Discussing the Results As suggested in Section 6, here we are discussing the remaining recourse outcome invalidation results. We show these results for two settings. In Figure 4, we demonstrate the efficacy of our greedy deletion algorithm across 4 data sets on the classification tasks using different classification models (ANN, logistic regression, Kernel-SVM). For the logistic regression and the ANN model, we use the infinitesimal jackknife approximation to calculate the probitively expensive retraining step as described in Section 5. We observe that our method well outperforms random guessing. The results also highlight that while the NTK theory allows to study the deletion effects from a theoretical point of view, if one is interested in empirical worst-case approximations, the infinitesimal jackknife can be a method of choice. As we observe this pattern across all recourse methods, we hypothesize that this is related to the instability of the trained ANN models, and we leave an investigation of this interesting phenomenon for future work.

Additionally, in Figure 5, we compare our SGD-based deletion algorithm to the greedy algorithm. For the SGD-based deletion results, we observe inverse-u-shaped curves on some method-data-model combinations. The reason for this phenomenon can be explained as follows: when the $ℓ_{0}$ regularization strength (i.e., $η$ ) is not strong enough, then the importance weights for the $k$ -th removal with $k > 5$ become more variable (i.e., SGD does not always select the most important data weight for larger $k$ ). This drop in performance can be mitigated by increasing the strength of the $ℓ_{0}$ regularizer within our SGD-based deletion algorithm.

Finally, in Figure 6 we study how well the critical points identified for the NTK model would invalidate a wide 2-layer ReLU network with 10000 hidden nodes. To study that question, we identified the points that lead to the highest invalidation on the NTK using our greedy method, and we then use these identified training points to invalidate the recourses suggested by the wide ANN. As before, we are running these experiments on the full data set across 5 folds. Figure 6 demonstrates the results of this strategy for the SCFE recourse method. We see that this strategy leads to an improvement of up to 30 percentage points over the random baseline, suggesting that critical points under NTK can be used to estimate recourse invalidation for wide ANN models.

Appendix C Implementation Details

c.1 Details on Model Training

We train the classification models using the hyperparameters given in Table 1. The ANN and the Logistic regression models are fit using the quasi-newton lbgfs solver. We add L2-regularization to the ANN weights. The other methods are trained via their analytical solutions. Below, in Algorithms 17 and 22, we show pseudocodes for both our greedy and sgd-based deletion methods to invalidate the recourse outcome. In order to do the optimization with respect to the recourse action stability measure, we slightly adjust Algorithm 22 to optimize the right metric from Definition 2.

c.2 Details on Generating the Counterfactuals

For DICE, for every test input, we generate two different counterfactual explanations. Then we randomly pick either the first or second counterfactual to be the counterfactual assigned to the given input.

Model	Parameters
Linear Regression	OLS, no hyperparameters.
NTK Regression	$β = 2$ (Admission), $β = 5$ (other data sets)
Logistic Regression	L2-Regularization with $C = 1.0$
Kernel-LSSVM	Gaussian Kernel with $γ = 1.0$ (see [cawley2004fast])
ANN	2-Layer, 30 Hidden units, Sigmoid, $α = 10$ (L2-Regularization)

Table 1: Model hyperparameters used in this work

c.3 Details on the Sparsity Regularizer

Since an $ℓ_{0}$ regularizer is computationally intractable for high-dimensional optimization problems, we have to resort to approximations. One such approximation approach was recently suggested by [yamada20a]. The underlying idea consists of converting the combinatorial search problem to a continuous search problem over distribution parameters. To this end, recall our optimization problem from the main text:

ω^{*} = a r g m a x ω \in {0, 1}^{n} m (ω) - η \cdot ∥ 1 - ω ∥_{0} .

(38)

We will now introduce Bernoulli random variables $Z_{i} \in {0, 1}$ with corresponding parameters $π_{i}$ to model the individual $ω_{i}$ . Instead of optimizing the objective above with respect to $ω$ we will optimize with respect to distribution parameters $π$ :

π^{*} = a r g m a x π m (Z (π)) - η \cdot ∥ 1 - Z (π) ∥_{0} .

(39)

Since the above optimization problem is known to suffer from high-variance solutions, [yamada20a] suggest to use a Gaussian-based continuous relaxation of the Bernoulli variables:

{~ Z}_{i} = max (0, min (1, μ_{i} + ϵ_{i})),

(40)

where $ϵ_{i} = N (0, σ^{2})$ , resulting in the following optimization problem:

μ^{*} = a r g m a x μ m (~ Z (μ)) - η \cdot ∥ 1 - ~ Z (μ) ∥_{0} .

(41)

At inference time, the optimal weights are then given by ${~ Z}_{i}^{*} = max (0, min (1, μ_{i}^{*})) \forall i \in [n]$ . To obtain discrete weights, we take the argmax over each individual ${~ Z}_{i}$ .

c.4 Details on the Jackknife Approximation

Required: Model:

f_{w (1)}

; Matrix of Recourses:

{ˇ X}_{1} \in R^{q \times d}

;

d

: input dimension;

q

number of recourse points on test set;

n

: # train points;

M

: max # deleted train points;

s

: invalidation target

ω^{(0)} = 1_{n}

▹

All training instances present

for

m = 1 : M

ω^{(m)} \leftarrow ω^{(m - 1)}

~ Y = 0_{n \times q}

▹

Recourse outcomes

J = 0_{n \times q}

▹

Invalidations present

S^{(m)} \leftarrow {i ∣ ∣ ω_{i}^{(m)} \neq 0}

▹

Set of train instances present at iteration

m

for

i \in S^{(m)}

w(i)new=update\_w(ω(m)−i)

▹

ω_{- i}^{(m)}

has additionally set weight

i

to 0.

▹

Use analytical or IJ solution for

w (ω)

~ Y [i, :] = f_{w_{new}^{(i)}} ({ˇ X}_{1})

▹

New recourse outcomes

J [i, :] = I (~ Y [i, :] < s)

▹

Invalidation present

end for

index

\leftarrow

{a r g m a x}_{i \in S^{(m)}} ∥ J [i, :] ∥_{1}

▹

Find point that leads to highest invalidation

ω^{(m)} [index] = 0

▹

Remove training point

end for

return:

ω^{(M)}

▹

data weights indicating

M

removals

Algorithm 1 Greedy recourse outcome invalidation

When the model parameters $w$ are a function of the data weights by solving (1) we can approximate $w (ω)$ using the infinitesimal Jackknife (IJ) without having to optimize (1) repeatedly [jaeckel1972infinitesimal, efron1982jackknife, giordano2019swiss, giordano2019highswiss]:

w_{IJ} (ω) = w_{1} - H_{1}^{- 1} G_{ω - 1},

(42)

where $G$ and $H_{1}$ are the Jacobian and the Hessian matrices of the loss function with respect to the data weights evaluated at the optimal model parameters $w$ , i.e., $G_{ω - 1} = \frac{1}{n} \sum_{i = 1}^{n} (ω_{i} - 1) \cdot \frac{\partial ℓ (f_{w} (x_{i}), y_{i})}{\partial w}$ and . Note that this technique computes the Hessian matrix $H_{1}$ only once. Using this Jackknife approximation, the Jacobian term $G_{ω - 1}$ becomes an explicit function of the data weights which makes the Jackknife approximation amenable to optimization.

Required: Model:

f_{w (1)}

; Matrix of Recourses:

{ˇ X}_{1} \in R^{q \times d}

;

d

: input dimension;

q

number of recourse points on test set;

n

: # train points;

M

: max # deleted train points;

s

: invalidation target

μ^{(1)} = 1_{n}

▹

Mu are soft data weights that are opimized.

for

m = 1 : Step

▹

Perform

S t e p

number of updates.

δ - loss = 0.0

for

k = 1 : K

▹

Use

K

Monte-Carlo Samples for the approximation

Sample

ϵ_{k}^{(m)} \sim N (0, σ^{2} I_{n})

ω_{k}^{(m)} = max (0, min (1, μ^{(m)} + ϵ_{k}^{(m)}))

▹

Sample (soft) data weights as in [yamada20a]

w(m)k,new=update\_w(ω(m)k)

▹

Compute model weights from data weights either analytically or with IJ

l_{k}^{(m)} = sigmoid (f_{w_{k, new}^{(m)}} ({ˇ X}_{1}) - s)

▹

Predict with new weights and compute soft invalidation.

δ - loss = δ - loss + ∥ l_{k}^{(m)} ∥_{1}

▹

Add up soft inval. loss

end for

r^{(m)} = \sum_{i = 1}^{n} Φ (\frac{1 - (μ^{(m)})_{i}}{σ})

▹

Sparsity Regularizer from [yamada20a]

μ^{(m + 1)} = μ^{(m)} + γ \nabla_{μ^{(m)}} (\frac{δ - loss}{D} + λ r^{(m)})

▹

Grad. Descent with lr.

γ

end for

removed_ind = argsort

(μ^{(Step + 1)})

▹

Sort indices ascendingly

j = 0

ω = 1_{n}

while

j < M

▹

Binarize and fulfil max number M

ω

[removed_ind[j]]= 0

j = j + 1

end while

return:

ω

▹

data weights indicating

M

removals

Algorithm 2 SGD recourse outcome invalidation