Influence functions - Welcome to my blog

Assume our training data consists of $n$ samples from $\mathcal{X} \times \mathcal{Y}$ and we have trained a model with parameters $\theta_n$ , e.g. through maximum likelihood estimation $\theta_n = \argmin \frac{1}{n} \sum_{i=1}^n L(x_i, y_i; \theta)$ . We would like to understand the influence of a single training datapoint $(x_j, y_j)$ on our parameters. The naive approach would be to refit the model without this datapoint, i.e. $\theta_{n\setminus j} = \argmin \sum_{i \neq j} L(x_i, y_i; \theta)$ . It is clear that for e.g. neural networks this is not feasible on a large scale. Influence functions (IF) offer a possible solution. Before we discuss IF we need to consider how we measure influence of a single datapoints, here multiple options present themselves:

Change in test loss: $\frac{1}{n} \sum_{(x, y) \in D_{test}} L(x, y; \theta_n) - L(x, y; \theta_{n \setminus j})$ #
Change in parameters: $|| \theta_n - \theta_{n \setminus j} ||$

We will begin by considering the second options.

It has been shown, Cook & Weisberg, 1986 that by upweighting the loss of $(x_j, y_j)$ , we can effeciently compute the change using second order approximations. To this end we define $\theta_{\epsilon, n\setminus j} = \argmin \frac{1}{n} \sum_{i=1}^n L(x_i, y_i; \theta) + \epsilon L_(x_j, y_j)$ . Essentially we weight the loss of $(x_j, y_j)$ by an additional factor of $\epsilon$ . Then

I_\theta(j) = \frac{d\theta_{\epsilon, n\setminus j}}{d \epsilon} \Bigr|_{\epsilon=0} = - H_\theta^{-1} \nabla_\theta L(x_j, y_j; \theta_n)

(1)

The key insight now, is to notice that upweighting by $\epsilon = -\frac{1}{n}$ corresponds to removing $(x_j, y_j)$ . Further we can also compute the change of loss for a test point in closed form:

I_L(j, (x, y)) = \frac{d L(x, y; \theta_{\epsilon, n \setminus j})}{d \epsilon} \Bigr|_{\epsilon=0} = -\nabla_\theta L(x, y; \theta_n)^T H_\theta^{-1} \nabla_\theta L(x_j, y_j; \theta_n)

(2)

References¶

Cook, R. D., & Weisberg, S. (1986). Residuals and Influence in Regression. Chapman. https://books.google.de/books?id=aMDpswEACAAJ

Posts

Divergence, Regularization and Variational Approaches