The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (2024)

Myeongseob Ko1Feiyang Kang1Weiyan Shi2Ming Jin1Zhou Yu2Ruoxi Jia1
1Virginia Tech 2Columbia University
{myeongseob, fyk, jinming, ruoxijia}@vt.edu, {ws2634, zy2461}@columbia.edu

Abstract

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models.

In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches.We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models.Our code is available at https://github.com/ruoxi-jia-group/Forward-INF.

1 Introduction

As the popularity of large-scale, black-box machine learning models continues to surge across diverse applications, the need for transparencyβ€”an understanding of the factors driving their predictive behaviorsβ€”becomes increasingly critical. These models are learned from training data, and as such, an important step towards achieving transparency lies in estimating the influence of individual training data points on the model’s predictions.

Extensive research on training data influence estimation has been conducted over the years[21, 28, 26, 15, 17]. Despite the diversity of techniques, they all fundamentally revolve around a central idea of assessing the counterfactual impact of a training data source:

How would the prediction on specific test points change if we removed a training source? (P1)

One line of approaches[15] focuses on the direct evaluation of the counterfactual impact, i.e., by retraining a model on the set excluding the training source and measuring the change in the prediction. Besides the obvious computational overhead, such evaluation results suffer from a low signal-to-noise ratio due to the stochasticity in widely-used learning algorithms and are largely inconsistent across different runs[36]. To magnify the change caused by removing a single source, existing techniques mostly involve retraining models on smaller subsets of the training data and measuring a source’s influence by aggregating its contribution to different subsets not containing the source[11, 16, 12]. While these methods produce more consistent influence scores, they are infeasible for large-scale models.

Another line of approaches bypasses the need for re-training by estimating the influence through the final trained model or intermediate checkpoints reached during the training process[21, 28]. In particular, they evaluate a training point’s influence using the corresponding gradient of the final model or checkpoints, which effectively represents the local changes made by introducing the point into training. However, calculating gradients is not only time-intensive but also memory-inefficient compared to forward pass[25]. In our tests, it took up to 10.92 times longer and used 5.36 times more memory than inference for the Vision Transformer vit-b-32[8]. Furthermore, identifying the most influential training point requires computing the gradient for every single training data point. Combined, these challenges hinder the efficient determination of data influence, especially for large-scale datasets and large models. Figure1 highlights the conceptual difference between existing approaches and we will defer the detailed discussion of Related Work to AppendixA.

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (1)

Given the discrepancy in efficiency between gradient calculation and inference, an intriguing yet uncharted question arises: Can we maximize the usage of forward pass when estimating influence? We will introduce a technique that exploits the considerable efficiency gap between forward and backward passes, especially when applied to all training points. This method is anchored on the following Mirrored Influence Hypothesis.

The Mirrored Influence Hypothesis. The train-to-test influence characterized by the problem (P1) is correlated with the test-to-train influence characterized by (P2) in which the role of training and test points is swapped:

How would the prediction on a training source change if the model was trained on specific test points? (P2)

More formally, consider a training dataset comprising N𝑁Nitalic_N data sources, Dtrn=D1βˆͺ…βˆͺDNsubscript𝐷trnsubscript𝐷1…subscript𝐷𝑁D_{\text{trn}}=D_{1}\cup\ldots\cup D_{N}italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT βˆͺ … βˆͺ italic_D start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT and a test set Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT. Let π’œπ’œ\mathcal{A}caligraphic_A denote the learning algorithm which takes a dataset as input and returns a model, and let β„’β„’\mathcal{L}caligraphic_L be a loss function. The train-to-test influence characterized by P1 can be expressed as

Inf⁒(Diβ†’Dtst)=ℒ⁒(π’œβ’(Dtrn),Dtst)βˆ’β„’β’(π’œβ’(Dtrnβˆ–Di),Dtst).Infβ†’subscript𝐷𝑖subscript𝐷tstβ„’π’œsubscript𝐷trnsubscript𝐷tstβ„’π’œsubscript𝐷trnsubscript𝐷𝑖subscript𝐷tst\begin{aligned} \text{Inf}(D_{i}\rightarrow D_{\text{tst}})=\mathcal{L}(%\mathcal{A}(D_{\text{trn}}),D_{\text{tst}})-\mathcal{L}(\mathcal{A}(D_{\text{%trn}}\setminus D_{i}),D_{\text{tst}}).\end{aligned}start_ROW start_CELL Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT β†’ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) = caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) - caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT βˆ– italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) . end_CELL end_ROW

(1)

On the other hand, the test-to-train influence characterized by P2 can be written as

Inf⁒(Di←Dtst)=ℒ⁒(π’œβ’(DtrnβˆͺDtst),Di)βˆ’β„’β’(π’œβ’(Dtrn),Di).Inf←subscript𝐷𝑖subscript𝐷tstβ„’π’œsubscript𝐷trnsubscript𝐷tstsubscriptπ·π‘–β„’π’œsubscript𝐷trnsubscript𝐷𝑖\begin{aligned} \text{Inf}(D_{i}\leftarrow D_{\text{tst}})=\mathcal{L}(%\mathcal{A}(D_{\text{trn}}\cup D_{\text{tst}}),D_{i})-\mathcal{L}(\mathcal{A}(%D_{\text{trn}}),D_{i}).\end{aligned}start_ROW start_CELL Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) = caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT βˆͺ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) . end_CELL end_ROW

(2)

We find that Inf⁒(Diβ†’Dtst)Infβ†’subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\rightarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT β†’ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) is highly correlatedwith Inf⁒(Di←Dtst)Inf←subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\leftarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) for i∈{1,…,N}𝑖1…𝑁i\in\{1,\ldots,N\}italic_i ∈ { 1 , … , italic_N }. Figure2 illustrates the correlation for convex and non-convex models trained on the CIFAR-10 dataset and we defer results on other datasets and models to the AppendixB due to the similarity in their trends.

Leveraging the Hypothesis. In most data influence estimation applications, the size of the test set is typically much smaller than the training set (|Dtst|subscript𝐷tst|D_{\text{tst}}|| italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT | <<much-less-than<<< <|Dtrn|subscript𝐷trn|D_{\text{trn}}|| italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT |). For example, when identifying influential training points for a specific model prediction, Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT represents just a single test point. Similarly, in detecting low-quality training data, Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT is a small, clean reference set, usually less than 1% of the entire training set[40, 38, 18]. The Mirrored Influence Hypothesis enables leveraging this size asymmetry between training data and test samples under concurrent examination to develop more efficient influence estimation algorithms. This hypothesis allows for a shift in approach: from calculating the train-to-test influence of training data, which requires locally updating the model for each training point, to assessing the test-to-train influence. In practice, this means updating the model for the relatively few test samples and conducting forward passes on training samples. This shift effectively applies the more computationally intensive process (the backward pass) to the smaller scale test set, while the computationally lighter task (the forward pass) is applied to the larger training data, optimizing overall efficiency.

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (2)

Evaluation. Leveraging the insights articulated, we develop a new influence estimation algorithm. We evaluate its performance across diverse applications handling image data, such as detecting mislabeled data, identifying data leakage, analyzing memorization, and attributing data in diffusion models. To further demonstrate its wide-ranging applicability, we extended its use to tracing behavior in language models. This new method not only showcases promising utility but also offers significantly faster performance compared to traditional train-to-test influence focused techniques. For instance, our method detects data leakage with 100% accuracy on CIFAR-10, surpassing the Influence Function[21] (30% detection rate) and TracIn[28] (0% detection rate) by being 30 and 40 times faster, respectively.

2 Delve Into the Hypothesis

In this section, we empirically assess the Mirrored Influence Hypothesis as applied to different datasets and models. Then, we study different influence approximators: Influence Function[21] and TracIn[28], and show that our hypothesized correlation between forward and backward influences holds naturally under the assumptions made by these approximators.

2.1 Empirical Study

We investigate the Hypothesis in both deterministic and stochastic learning settings, due to the distinct noise levels inherent in the influence scores of these two scenarios. Specifically, both train-to-test and test-to-train influences depend on the chosen learning algorithm π’œπ’œ\mathcal{A}caligraphic_A. Algorithms like stochastic gradient descent (SGD) introduce randomness, such as through random mini-batch selection, which affects the training process and, consequently, the trained models. This randomness can cause variations in the losses evaluated on these models, ultimately impacting the influence scores, which are derived from these losses, as seen in Eqn. (1) and (2).

Particularly, the influences for individual training and test points, which are determined by the exclusion of a training point or the addition of a test data point into the training set, are typically small and can be heavily impacted by the stochasticity of the learning process. For example, prior research[35] shows that the variability in an individual point’s influence due to learning stochasticity often largely surpasses their magnitude. This means that even when evaluating the same type of influence, there can be a significant inconsistency in the rankings of different data points’ influence scores across different algorithm runs. Hence, to meaningfully analyze the correlation between train-to-test and test-to-train influences, it is critical to focus on settings where the signal strength in both types of influence scores is substantial enough to withstand the overshadowing effect of noise.

A Near-Noiseless Setting. We start with a near-noiseless setting where the model being trained is strongly convex and the learning algorithm is deterministic. With long enough iterations, the final model π’œβ’(D)π’œπ·\mathcal{A}(D)caligraphic_A ( italic_D ) is guaranteed to converge to the vicinity of the global minima for any training dataset D𝐷Ditalic_D. Specifically, we train a logistic regression model using L-BFGS[24] with L2 regularization. Due to the low noise in this setting, we can investigate the point-to-point influence, i.e., |Di|=|Dtst|=1subscript𝐷𝑖subscript𝐷tst1|D_{i}|=|D_{\text{tst}}|=1| italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = | italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT | = 1.We use Pearson correlation and Spearman rank-order correlation coefficient to measure the correlation between the two influences. As we are mainly interested in a test point that has a high loss (e.g., in the application of debugging a misclassified test point), we select 10 test points that have the highest loss and take an average over their correlation scores.

A Noisy Setting. As discussed, training non-convex models using stochastic learning algorithms inherently produces a high level of noise in the influence between individual training and test points. Specifically, we train a convolutional neural network (CNN) using SGD with a learning rate of 0.01. To mitigate the noise, our initial approach is to average the influence scores over multiple runs of the learning algorithm. However, the number of runs required to sufficiently reduce the noise is prohibitively demanding. To effectively analyze the correlation between two types of influences, we design experiments where the influence scores across different Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are more significant in magnitude, thus reducing the chance of these scores being overwhelmed by noise. Specifically, we assess the group-to-group influence, i.e., the impact of various groups of training points on a test set. Effectively, modifying a group of points in the training set would result in more substantial changes in loss, thereby making the influence scores more indicative.To amplify the effect of each group’s removal or addition, we randomly mislabel 50% of samples into different classes and assign varying mislabeling ratios to each group. We use 30 different training groups with mislabeling ratios ranging linearly from 0% to 100%. Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT consists solely of clean samples.

Result. In Table2.1, we show Pearson and Spearman rank-order correlations between Inf⁒(Diβ†’Dtst)Infβ†’subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\rightarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT β†’ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) and Inf⁒(Di←Dtst)Inf←subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\leftarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) for both settings.For the noiseless setting, we observe high correlation scores (i.e., greater than 0.96 in terms of Pearson Correlation) across different datasets. On the other hand, the Spearman correlation is relatively lower. As illustrated by Figure2, there exists a high-density region with many similar values, which could lead to a lot of tied ranks. These ties can disrupt the monotonic relationship that Spearman correlation seeks to measure.

In the noisy setting with group-to-group influence, the correlation between the two types of influences remains high. Notably, while the Pearson correlation exhibits a decrease in comparison to the noiseless scenario, Spearman correlation retains a high value. This discrepancy arises partly because Pearson correlation, which relies on actual data values, is more susceptible to noise. In contrast, Spearman correlation employs ranks rather than raw values, which inherently provides resistance to the distorting effects of noise. Moreover, as depicted in Figure2, the relationship between train-to-test and test-to-train influences maintains its correlation, albeit with a diminished linear characteristic. The pronounced linear correlation for point-to-point influences in the noiseless setting aligns with the theoretical insights discussed in the subsequent subsection. These insights indicate that under minor perturbations to the training set, such as the alteration of a single data point, it can be shown that training and test data have symmetrical roles and the two types of influences are empirically equivalent. Conversely, in the context of the group-to-group influence, the removal or addition of entire data groups leads to more significant model perturbations, which are not adequately described by current theoretical frameworks. An in-depth theoretical exploration of the correlations between the two influences under general conditions is outside this paper’s scope and presents an intriguing avenue for future research.

SettingMetricMNISTFMNISTCIFAR-10
Near-Noiseless (point-to-point inf)Pearson0.99750.99390.9673
Spearman0.90270.72740.8069
Noisy(group-to-group inf)Pearson0.96400.97520.8551
Spearman0.99070.99150.8848

2.2 Validity of the Hypothesis for Influence Approximators

In addition to the empirical study, we show that the hypothesized mirrored influence holds for well-known influence approximators: Influence functions[21] and TracIn[28].

We begin by introducing the notations. For ease of exposition, we will examine the validity of the Hypothesis in the case where |Di|=1subscript𝐷𝑖1|D_{i}|=1| italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = 1 and |Dtst|=1subscript𝐷tst1|D_{\text{tst}}|=1| italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT | = 1. The argument naturally extends to more general cases. Specifically,consider a training set consisting of n𝑛nitalic_n samplesDtrn={z1,z2,…⁒zn}subscript𝐷trnsubscript𝑧1subscript𝑧2…subscript𝑧𝑛D_{\text{trn}}=\{z_{1},z_{2},...z_{n}\}italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT = { italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, a given test sample ztstsubscript𝑧tstz_{\text{tst}}italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT,and a neural network parameterized by ΞΈβˆˆβ„dπœƒsuperscriptℝ𝑑\theta\in\mathbb{R}^{d}italic_ΞΈ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.Define the loss function β„’β„’\mathcal{L}caligraphic_L as the model’s empirical risk on the training dataset ℒ⁒(ΞΈ,Dtrn)=1nβ’βˆ‘i=1nℓ⁒(ΞΈ,zi)β„’πœƒsubscript𝐷trn1𝑛superscriptsubscript𝑖1π‘›β„“πœƒsubscript𝑧𝑖\mathcal{L}(\theta,D_{\text{trn}})=\frac{1}{n}\sum_{i=1}^{n}\ell(\theta,z_{i})caligraphic_L ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_β„“ ( italic_ΞΈ , italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), where ℓ⁒(ΞΈ,zi)β„“πœƒsubscript𝑧𝑖\ell(\theta,z_{i})roman_β„“ ( italic_ΞΈ , italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) denotes the loss of a predictor parameterized by ΞΈπœƒ\thetaitalic_ΞΈ on the training sample zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The optimal model parameters are given by the following empirical risk minimization: ΞΈ^:=arg⁑minθ⁑ℒ⁒(ΞΈ,Dtrn)assign^πœƒsubscriptπœƒβ„’πœƒsubscript𝐷trn\hat{\theta}:=\arg\min_{\theta}\mathcal{L}(\theta,D_{\text{trn}})over^ start_ARG italic_ΞΈ end_ARG := roman_arg roman_min start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT caligraphic_L ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ).

Influence Function (IF). The idea of IF is to analyze the change in prediction loss when a training sample is up-weighted infinitesimally. In particular, if we perturb the weight of a sample z𝑧zitalic_z from 1111 to 1+Ξ΅1πœ€1+\varepsilon1 + italic_Ξ΅, the new parameter on the perturbed training dataset can be given as ΞΈ^Ξ΅,z=arg⁑minθ⁑ℒ⁒(ΞΈ,D)+Ρ⁒ℓ⁒(ΞΈ,z)subscript^πœƒπœ€π‘§subscriptπœƒβ„’πœƒπ·πœ€β„“πœƒπ‘§\hat{\theta}_{\varepsilon,z}=\arg\min_{\theta}\mathcal{L}(\theta,D)+%\varepsilon\ell(\theta,z)over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_z end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT caligraphic_L ( italic_ΞΈ , italic_D ) + italic_Ξ΅ roman_β„“ ( italic_ΞΈ , italic_z ).With assumptions on the loss function being twice-differentiable and strictly convex, the influence of an infinitesimal perturbation of a training sample z𝑧zitalic_z on the loss of a test sample ztstsubscript𝑧tstz_{\text{tst}}italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT can be calculated as [21]

d⁒ℓ⁒(ΞΈ^Ξ΅,z,ztst)d⁒ϡ|Ξ΅=0=βˆ’βˆ‡ΞΈβ„“β’(ΞΈ^,ztst)T⁒HΞΈ^βˆ’1β’βˆ‡ΞΈβ„“β’(ΞΈ^,z),evaluated-at𝑑ℓsubscript^πœƒπœ€π‘§subscript𝑧tst𝑑italic-Ο΅πœ€0subscriptβˆ‡πœƒβ„“superscript^πœƒsubscript𝑧tst𝑇superscriptsubscript𝐻^πœƒ1subscriptβˆ‡πœƒβ„“^πœƒπ‘§\displaystyle\left.\frac{d\ell(\hat{\theta}_{\varepsilon,z},z_{\text{tst}})}{d%\epsilon}\right|_{\varepsilon=0}=-\nabla_{\theta}\ell(\hat{\theta},z_{\text{%tst}})^{T}H_{\hat{\theta}}^{-1}\nabla_{\theta}\ell(\hat{\theta},z),divide start_ARG italic_d roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_z end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) end_ARG start_ARG italic_d italic_Ο΅ end_ARG | start_POSTSUBSCRIPT italic_Ξ΅ = 0 end_POSTSUBSCRIPT = - βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_z ) ,(3)

where HΞΈ^:=βˆ‡ΞΈ2ℓ⁒(ΞΈ^,D)assignsubscript𝐻^πœƒsuperscriptsubscriptβˆ‡πœƒ2β„“^πœƒπ·H_{\hat{\theta}}:=\nabla_{\theta}^{2}\ell(\hat{\theta},D)italic_H start_POSTSUBSCRIPT over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT := βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_D ) denotes the Hessian. Since removing a point z𝑧zitalic_z is equivalent to upweighting it by Ξ΅=βˆ’1nπœ€1𝑛\varepsilon=-\frac{1}{n}italic_Ξ΅ = - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, for n𝑛nitalic_n sufficiently large and 1nβ†’0β†’1𝑛0\frac{1}{n}\rightarrow 0divide start_ARG 1 end_ARG start_ARG italic_n end_ARG β†’ 0, one can approximate the train-to-test influence defined in Eqn. (1) with its first-order Taylor approximation, which gives

Inf⁒(Diβ†’Dtst)β‰ˆ(βˆ’1n)β‹…[βˆ’d⁒ℓ⁒(ΞΈ^Ξ΅,z,ztst)d⁒Ρ|Ξ΅=0],Infβ†’subscript𝐷𝑖subscript𝐷tstβ‹…1𝑛delimited-[]evaluated-at𝑑ℓsubscript^πœƒπœ€π‘§subscript𝑧tstπ‘‘πœ€πœ€0\text{Inf}(D_{i}\rightarrow D_{\text{tst}})\approx\left(-\frac{1}{n}\right)%\cdot\left[-\left.\frac{d\ell(\hat{\theta}_{\varepsilon,z},z_{\text{tst}})}{d%\varepsilon}\right|_{\varepsilon=0}\right],Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT β†’ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) β‰ˆ ( - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) β‹… [ - divide start_ARG italic_d roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_z end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) end_ARG start_ARG italic_d italic_Ξ΅ end_ARG | start_POSTSUBSCRIPT italic_Ξ΅ = 0 end_POSTSUBSCRIPT ] ,

combing with Eqn. (3), we have

Inf⁒(Diβ†’Dtst)β‰ˆβˆ’1nβ’βˆ‡ΞΈβ„“β’(ΞΈ^,ztst)T⁒HΞΈ^βˆ’1β’βˆ‡ΞΈβ„“β’(ΞΈ^,z)Infβ†’subscript𝐷𝑖subscript𝐷tst1𝑛subscriptβˆ‡πœƒβ„“superscript^πœƒsubscript𝑧tst𝑇superscriptsubscript𝐻^πœƒ1subscriptβˆ‡πœƒβ„“^πœƒπ‘§\displaystyle\text{Inf}(D_{i}\rightarrow D_{\text{tst}})\approx-\frac{1}{n}%\nabla_{\theta}\ell(\hat{\theta},z_{\text{tst}})^{T}H_{\hat{\theta}}^{-1}%\nabla_{\theta}\ell(\hat{\theta},z)Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT β†’ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) β‰ˆ - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_z )(4)

Symmetrically, consider the alternative of adding a test sample ztstsubscript𝑧tstz_{\text{tst}}italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT with weight Ξ΅πœ€\varepsilonitalic_Ξ΅ to the training dataset. The model trained with the new objective will be ΞΈ^Ξ΅,ztst=arg⁑minθ⁑ℒ⁒(ΞΈ,D)+Ρ⁒ℓ⁒(ΞΈ,ztst)subscript^πœƒπœ€subscript𝑧tstsubscriptπœƒβ„’πœƒπ·πœ€β„“πœƒsubscript𝑧tst\hat{\theta}_{\varepsilon,z_{\text{tst}}}=\arg\min_{\theta}\mathcal{L}(\theta,%D)+\varepsilon\ell(\theta,z_{\text{tst}})over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT caligraphic_L ( italic_ΞΈ , italic_D ) + italic_Ξ΅ roman_β„“ ( italic_ΞΈ , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ). Similar to Eqn. (3), the influence of training on the test sample ztstsubscript𝑧tstz_{\text{tst}}italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT with an infinitesimal weight Ξ΅πœ€\varepsilonitalic_Ξ΅ on the prediction loss of a training sample z𝑧zitalic_z can be calculated as

d⁒ℓ⁒(ΞΈ^Ξ΅,ztst,z)d⁒Ρ|Ξ΅=0=βˆ’βˆ‡ΞΈβ„“β’(ΞΈ^,z)T⁒HΞΈ^βˆ’1β’βˆ‡ΞΈβ„“β’(ΞΈ^,ztst)evaluated-at𝑑ℓsubscript^πœƒπœ€subscript𝑧tstπ‘§π‘‘πœ€πœ€0subscriptβˆ‡πœƒβ„“superscript^πœƒπ‘§π‘‡superscriptsubscript𝐻^πœƒ1subscriptβˆ‡πœƒβ„“^πœƒsubscript𝑧tst\displaystyle\left.\frac{d\ell(\hat{\theta}_{\varepsilon,z_{\text{tst}}},z)}{d%\varepsilon}\right|_{\varepsilon=0}=-\nabla_{\theta}\ell(\hat{\theta},z)^{T}H_%{\hat{\theta}}^{-1}\nabla_{\theta}\ell(\hat{\theta},z_{\text{tst}})divide start_ARG italic_d roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_z ) end_ARG start_ARG italic_d italic_Ξ΅ end_ARG | start_POSTSUBSCRIPT italic_Ξ΅ = 0 end_POSTSUBSCRIPT = - βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_z ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )(5)

As adding a test point into the training set is the same as setting Ξ΅=1nπœ€1𝑛\varepsilon=\frac{1}{n}italic_Ξ΅ = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG, following the same procedure, we can again linearly approximate the test-to-train influence in Eqn. (2) by computing

Inf⁒(Di←Dtst)β‰ˆβˆ’1nβ’βˆ‡ΞΈβ„“β’(ΞΈ^,z)T⁒HΞΈ^βˆ’1β’βˆ‡ΞΈβ„“β’(ΞΈ^,ztst)Inf←subscript𝐷𝑖subscript𝐷tst1𝑛subscriptβˆ‡πœƒβ„“superscript^πœƒπ‘§π‘‡superscriptsubscript𝐻^πœƒ1subscriptβˆ‡πœƒβ„“^πœƒsubscript𝑧tst\displaystyle\text{Inf}(D_{i}\leftarrow D_{\text{tst}})\approx-\frac{1}{n}%\nabla_{\theta}\ell(\hat{\theta},z)^{T}H_{\hat{\theta}}^{-1}\nabla_{\theta}%\ell(\hat{\theta},z_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) β‰ˆ - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_z ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )(6)

Due to the fact that Hessian is symmetric by definition, Eqn. (4) and Eqn. (6) are equivalent. Then,we can observe that using influence functions to approximate influence yields the same result for both train-to-test and test-to-train influences, which coincides with our Mirrored Influence Hypothesis.

TracIn. TracIn[28] approximates the influence of a training sample zitsubscriptsuperscript𝑧𝑑𝑖z^{t}_{i}italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on a testing sample ztstsubscript𝑧tstz_{\text{tst}}italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT using a first-order approximation of the model and aggregating through multiple checkpoints during the training process:

Inf⁒(Diβ†’Dtst)β‰ˆβˆ‘cβˆˆπ’žΞ·cβ’βˆ‡ΞΈβ„“β’(ΞΈ^c,ztst)β‹…βˆ‡ΞΈβ„“β’(ΞΈ^c,z),Infβ†’subscript𝐷𝑖subscript𝐷tstsubscriptπ‘π’žβ‹…subscriptπœ‚π‘subscriptβˆ‡πœƒβ„“subscript^πœƒπ‘subscript𝑧tstsubscriptβˆ‡πœƒβ„“subscript^πœƒπ‘π‘§\text{Inf}(D_{i}\rightarrow D_{\text{tst}})\approx\sum_{c\in\mathcal{C}}\eta_{%c}\nabla_{\theta}\ell(\hat{\theta}_{c},z_{\text{tst}})\cdot\nabla_{\theta}\ell%(\hat{\theta}_{c},z),Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT β†’ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) β‰ˆ βˆ‘ start_POSTSUBSCRIPT italic_c ∈ caligraphic_C end_POSTSUBSCRIPT italic_Ξ· start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) β‹… βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_z ) ,(7)

where c𝑐citalic_c denotes an index of iteration during model training, π’žπ’ž\mathcal{C}caligraphic_C denotes the set of iteration indices where checkpoints are available, and Ξ·csubscriptπœ‚π‘\eta_{c}italic_Ξ· start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and ΞΈ^csubscript^πœƒπ‘\hat{\theta}_{c}over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT represent the step size and the model weights at iteration c𝑐citalic_c, respectively. Then, we also consider the alternative of estimating the influence on training sample z𝑧zitalic_z by training the model on the test sample ztstsubscript𝑧tstz_{\text{tst}}italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT at each checkpoint, which can be easily given as

Inf⁒(Di←Dtst)β‰ˆβˆ‘cβˆˆπ’žΞ·cβ’βˆ‡ΞΈβ„“β’(ΞΈ^c,z)β‹…βˆ‡ΞΈβ„“β’(ΞΈ^c,ztst)Inf←subscript𝐷𝑖subscript𝐷tstsubscriptπ‘π’žβ‹…subscriptπœ‚π‘subscriptβˆ‡πœƒβ„“subscript^πœƒπ‘π‘§subscriptβˆ‡πœƒβ„“subscript^πœƒπ‘subscript𝑧tst\displaystyle\text{Inf}(D_{i}\leftarrow D_{\text{tst}})\approx\sum_{c\in%\mathcal{C}}\eta_{c}\nabla_{\theta}\ell(\hat{\theta}_{c},z)\cdot\nabla_{\theta%}\ell(\hat{\theta}_{c},z_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) β‰ˆ βˆ‘ start_POSTSUBSCRIPT italic_c ∈ caligraphic_C end_POSTSUBSCRIPT italic_Ξ· start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_z ) β‹… βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )(8)

Again, with the TracIn approximator, the influence between a training-test sample pair can be calculated from both directions and achieve the same result. Echoing our intuition, these results suggest the universal application of the proposed Hypothesis in influence approximators.

3 Forward-INF: An Influence Approximation Algorithm Harnessing Forward Passes

In this section, we will present a new data influence estimation algorithm unlocked by the Hypothesis. This algorithm differs from existing methods by substituting the backward pass computations for individual training points with a forward pass, aiming to reduce computational costs.

Inspired by the Hypothesis, to rank the train-to-test influences among different training points, we can alternatively arrange them in order based on the test-to-train influences, as described in Eqn. (2). Specifically, calculating Inf⁒(Di←Dtst)Inf←subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\leftarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) involves first acquiring two models ΞΈ^=π’œβ’(D)^πœƒπ’œπ·\hat{\theta}=\mathcal{A}(D)over^ start_ARG italic_ΞΈ end_ARG = caligraphic_A ( italic_D ) and ΞΈ^+Dtst=π’œβ’(DβˆͺDtst)subscript^πœƒsubscript𝐷tstπ’œπ·subscript𝐷tst\hat{\theta}_{+D_{\text{tst}}}=\mathcal{A}(D\cup D_{\text{tst}})over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT = caligraphic_A ( italic_D βˆͺ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ). We assume that ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG is available after training. As getting ΞΈ^+Dtstsubscript^πœƒsubscript𝐷tst\hat{\theta}_{+D_{\text{tst}}}over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT by training on DβˆͺDtst𝐷subscript𝐷tstD\cup D_{\text{tst}}italic_D βˆͺ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT from scratch can be expansive, we propose to use continual learning and obtain this model by updating the existing model ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG with Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT. Denote the resulting model as ΞΈ~+Dtstsubscript~πœƒsubscript𝐷tst\tilde{\theta}_{+D_{\text{tst}}}over~ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT. In AppendixC, we will show that when n𝑛nitalic_n is large and ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG and ΞΈ^+Dtstsubscript^πœƒsubscript𝐷tst\hat{\theta}_{+D_{\text{tst}}}over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT are close, continually updating ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG on Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT is a good approximation to training from scratch. In AppendixE, we will also empirically compare the resulting influence scores of the two settings. Finally, with the two models ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG and ΞΈ~+Dtstsubscript~πœƒsubscript𝐷tst\tilde{\theta}_{+D_{\text{tst}}}over~ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT, one can calculate the influence for each training source Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by two forward passes, which give ℒ⁒(ΞΈ~+Dtst,Di)β„’subscript~πœƒsubscript𝐷tstsubscript𝐷𝑖\mathcal{L}(\tilde{\theta}_{+D_{\text{tst}}},D_{i})caligraphic_L ( over~ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and ℒ⁒(ΞΈ^,Di)β„’^πœƒsubscript𝐷𝑖\mathcal{L}(\hat{\theta},D_{i})caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and then take the difference.

Implementation. The pseudo-code is provided in Algorithm1. We call this algorithm the Forward-INF algorithm because it implements forward passes on the training set. Note that this algorithm still applies backward passes on the test set. However, as typically, the training size is orders of magnitude larger than the number of test points being inspected concurrently, this algorithm is usually much faster than existing methods, like Influence Functions and TracIn, which apply backward passes on the training set (and the test set).

Also, note that gradient ascent is implemented as default to update ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG. This is because if the test sample is drawn from a similar distribution as the training data, the magnitude of the gradient βˆ‡ΞΈβ„“β’(ΞΈ,ztst)|ΞΈ=ΞΈ^evaluated-atsubscriptβˆ‡πœƒβ„“πœƒsubscript𝑧tstπœƒ^πœƒ\left.\nabla_{\theta}\ell(\theta,z_{\text{tst}})\right|_{\theta=\hat{\theta}}βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( italic_ΞΈ , italic_z start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ΞΈ = over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT tends to be small. Thus, employing gradient descent would introduce a small loss difference ℒ⁒(ΞΈ^K,Di)βˆ’β„’β’(ΞΈ^,Di)β„’subscript^πœƒπΎsubscript𝐷𝑖ℒ^πœƒsubscript𝐷𝑖\mathcal{L}(\hat{\theta}_{K},D_{i})-\mathcal{L}(\hat{\theta},D_{i})caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). By contrast, gradient ascent is likely to produce a model underfitting for points similar to the test points, thus resulting in a larger loss difference for training points, which is beneficial for comparing their influences. However, it is observed in experiments that both gradient descent and ascent perform similarly well. A detailed ablation study on this aspect is deferred to AppendixE.

1:Training set Dtrn=D1βˆͺ…βˆͺDnsubscript𝐷trnsubscript𝐷1…subscript𝐷𝑛D_{\text{trn}}=D_{1}\cup\ldots\cup D_{n}italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT βˆͺ … βˆͺ italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, Trained model ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG, Target test samples Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT, Number of continual learning iterations K𝐾Kitalic_K, Learning rate α𝛼\alphaitalic_Ξ±

2:Forward-INF(Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) for i=1,…,n𝑖1…𝑛i=1,\ldots,nitalic_i = 1 , … , italic_n

3:Initialize ΞΈ^0←θ^←subscript^πœƒ0^πœƒ\hat{\theta}_{0}\leftarrow\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← over^ start_ARG italic_ΞΈ end_ARG

4:forj=1𝑗1j=1italic_j = 1 to K𝐾Kitalic_Kdo

5:Gradient ascent: ΞΈ^j+1←θ^j+Ξ±β’βˆ‡ΞΈβ„’β’(ΞΈ^j,Dtst)←subscript^πœƒπ‘—1subscript^πœƒπ‘—π›Όsubscriptβˆ‡πœƒβ„’subscript^πœƒπ‘—subscript𝐷tst\hat{\theta}_{j+1}\leftarrow\hat{\theta}_{j}+\alpha\nabla_{\theta}\mathcal{L}(%\hat{\theta}_{j},D_{\text{tst}})over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ← over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_Ξ± βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )

6:endfor

7:forDiβŠ‚Dtrnsubscript𝐷𝑖subscript𝐷trnD_{i}\subset D_{\text{trn}}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT βŠ‚ italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPTdo

8:Forward pass of ΞΈ^Ksubscript^πœƒπΎ\hat{\theta}_{K}over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT to get ℒ⁒(ΞΈ^K,Di)β„’subscript^πœƒπΎsubscript𝐷𝑖\mathcal{L}(\hat{\theta}_{K},D_{i})caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )

9:Forward pass of ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG to get ℒ⁒(ΞΈ^,Di)β„’^πœƒsubscript𝐷𝑖\mathcal{L}(\hat{\theta},D_{i})caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )

10:Forward-INF(Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) = ℒ⁒(ΞΈ^K,Di)βˆ’β„’β’(ΞΈ^,Di)β„’subscript^πœƒπΎsubscript𝐷𝑖ℒ^πœƒsubscript𝐷𝑖\mathcal{L}(\hat{\theta}_{K},D_{i})-\mathcal{L}(\hat{\theta},D_{i})caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )

11:endfor

Hyperparameter. The number of maximization iterations K𝐾Kitalic_K and the learning rate α𝛼\alphaitalic_Ξ± can be tuned if one has access to or can create some ground-truth β€œinfluential points.” For example, one could use a subset of training data as test samples. Intuitively, the same training point would be most influential to the test. In this case, one can tune K𝐾Kitalic_K so that the duplicates in the training set (i.e., the ground-truth influential point) are assigned with the highest influence score.

4 Application

In this section, we evaluate our approach to both vision and natural language processing (NLP) tasks. In particular, we apply our proposed method to the data influence estimation problem in diffusion models[36] (Section4.1), data leakage detection[4] (Section4.2), analysis of memorization[11] (Section4.3) as well as mislabeled data detection (Section4.4).

We extend our method into the NLP task to showcase the performance in the context of a model behavior tracing task [3] (Section4.5). The scope of this study is to emphasize the method’s versatility across different applications rather than to outperform existing application-specific baselines in each case. However, we will compare our test-to-train influence calculation method with existing train-to-test influence methods, both characterized by their application-agnostic nature.

4.1 Data Influence Estimation in Diffusion Model

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (3)

Motivation. Recent advancements in generative models, such as stable diffusion[30], have shown remarkable performance in synthesizing high-quality images. Even though the generated images differ from the original training data, these are largely influenced by them, raising potential issues of copyright infringement[37]. Therefore, it is important to identify the training points contributing most to synthesizing a specific output.

Setup. Following[36], we leverage the stable diffusion[30] as a pre-trained model and a specific concept in ImageNet with various pre-defined prompts related to the concept for fine-tuning. This will generate a model customized to the concept. The synthesized images from the customized model are computationally influenced by the fine-tuning examples by construction. Hence, the fine-tuning examples can be regarded as ground-truth high-influence training points. Our goal is to test whether the proposed method can indeed assign a higher influence to the fine-tuning points than the points in the pre-training set.We regard each individual synthesized image as Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT and apply our method to calculate the influence of individual training points in both the pre-training set (i.e., LAION 2B[31]) and the fine-tuning set. We adopt a strategy similar to[3] to accelerate the evaluation. In particular,we create a candidate set DcandidateβŠ‚Dtrnsubscript𝐷candidatesubscript𝐷trnD_{\text{candidate}}\subset D_{\text{trn}}italic_D start_POSTSUBSCRIPT candidate end_POSTSUBSCRIPT βŠ‚ italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT[3], consisting of: (1) Ground truth, which is the fine-tuning examples and (2) Distractors, which include 50505050 pre-training samples whose captions have the most overlap with the given prompt as well as 50505050 random samples drawn from the pre-training set.

Result. Figure3 shows the qualitative results of our proposed method. We retrieved a set of 7 samples with the highest influence scores from the candidate set.We observe that the fine-tuning image consistently is ranked with the highest influence. This aligns with the design of our experiment, where the fine-tuned example is deliberately constructed to exert the most computational influence on the synthesized image. Also, it can be seen thatthe remaining samples retrieved mostly share similarities either in image or caption or both with the synthesized image.We also provide the quantitative results in identifying the ground truth across different sizes of candidate sets and the comparison with the baselines in the AppendixE.

4.2 Data Leakage Detection

CIFAR-10 – ResNet18CIFAR-100 – ResNet50
MethodTimeT-1T-5T-10TimeT-1T-5T-100
IF-10010303545380020
IF-100015454550920025
IF-1000072000570000
TracIn-1122674361119
TracIn-3366101413091420
TracIn-56061014215101319
Forward-INF0.31001001001.595100100

Motivation. Data leakage refers to an oftentimes unintended mistake that is made by the creator of a machine learning model in which they accidentally share the information between the test and training data set. In this task, we apply data influence estimation methods to detect data leakage. Intuitively, for a given test point, if there exists a leakage to the training set, then the corresponding leaked duplicate point would have the highest influence.To evaluate the detection performance of the leaked samples, we use the top-k detection rate metric.

Setup. In training data leakage evaluation, we use ResNet-18 (RN18) and ResNet-50 (RN50) classifiers[14], trained on CIFAR-10[22] and CIFAR-100[7], respectively. To simulate the case of data leakage, we set Dt⁒s⁒tβŠ‚Dt⁒r⁒nsubscript𝐷𝑑𝑠𝑑subscriptπ·π‘‘π‘Ÿπ‘›D_{tst}\subset D_{trn}italic_D start_POSTSUBSCRIPT italic_t italic_s italic_t end_POSTSUBSCRIPT βŠ‚ italic_D start_POSTSUBSCRIPT italic_t italic_r italic_n end_POSTSUBSCRIPT. For baseline methods, we vary the key hyperparameters (i.e., the number of depths for Taylor expansion used by LISSA in approximating the Hessian for IF[2, 26] and the number of checkpoints for TracIn). We present the details, and results of ImageNet100 in AppendixE.

Result. As shown in Table2, our approach is effective in identifying duplicated samples with 100% and 95% top-1 detection accuracy for both RN18 and RN50 classifiers.We observe that with the larger model RN50, the performance of IF deteriorates when detecting data leakage. This result is consistent with[5] that IFs are poor estimates of the impact of excluding a training point for neural networks.Regarding the depth parameter, the algorithm’s convergence to Ξ±βˆ’1⁒(G+λ⁒I)βˆ’1⁒vsuperscript𝛼1superscriptπΊπœ†πΌ1𝑣\alpha^{-1}(G+\lambda I)^{-1}vitalic_Ξ± start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_G + italic_Ξ» italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_v depends on the condition α⁒(G~+λ⁒I)βͺ―Iprecedes-or-equals𝛼~πΊπœ†πΌπΌ\alpha(\tilde{G}+\lambda I)\preceq Iitalic_Ξ± ( over~ start_ARG italic_G end_ARG + italic_Ξ» italic_I ) βͺ― italic_I being valid at every step [grosse2023studying], which is rarely the case with large and complex models. Hence, errors could accumulate with each iteration, leading to the worst result for the IF-10000. Moreover using more iterations can result in a larger per-iteration cost.It is interesting to note that TracIn also exhibits poor performance in data leakage detection.Although it often identifies visually similar samples to the test example, it fails to accurately detect the ground-truth leaked sample as illustrated in Figure 12. This discrepancy arises because, although the gradient of a duplicated sample may align with that of a test sample, numerous other training samples may align in a similar direction but with greater magnitude, yielding even higher scores than the duplicate itself.Additionally, TracIn in large-scale models often relies on last-layer gradient information [3], which may lead to a suboptimal approximation to ground truth attribution.As shown in Table2, Forward-INFachieves the best scores in terms of both efficacy and computational efficiency, while other approaches suffer from a significant computational bottleneck (benchmarked on NVIDIA GeForce RTX 2080 Ti), as well as difficulty in finding duplicated pairs.

4.3 Memorization Analysis

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (4)

Motivation. Prior research[11], studied the pair of training and test samples in which memorization of the training point is important to predicting a given test sample. Characterizing these memorized pairs is essential for understanding the learning mechanisms in neural networks. Towards that end, they proposed a re-training-based approach that directly evaluates the train-to-test influence in order to identify such so-called β€œinfluential pairs” (following their terminology). However, this approach entails high computational requirements. Here, we aim to examine whether our approach, which is much more light-weighted than[11], can identify the same pairs.

Setup. To ensure a fair comparison, we employ the same training algorithm as described in [11] to train ResNet-50 classifiers on CIFAR-100. We utilize the influential pairs provided by the authors for a qualitative comparison.

Results. In Figure4, we compare the most influential training point as determined by our influence score calculations with the results from prior research. In each sub-figure, the first two panels display the test point and the most influential training point characterized by[11], while the third one presents the top-influence point retrieved by Forward-INF. Figure4 shows that our approach can identify the same influential pairs as those in[11] but without retraining models. As mentioned in[11], the high-influential pairs (with an influence score >0.4absent0.4>0.4> 0.4) are near duplicated samples and benefit the most from memorization. Therefore, it leads to high memorization scores. Further qualitative results are provided in AppendixE.

4.4 Mislabeled Data Detection

Motivation. Automated identification of incorrectly labeled samples in training datasets is essential, particularly given the high incidence of human labeling errors[19]. Human judgment often varies and can be subjective, leading to inconsistent labeling, a situation that is particularly pronounced in the case of ambiguous samples. Reliable mislabeled data detection can substantially reduce the costs associated with human labeling by facilitating automated checks.

Setup. For our study, we select a random training subset Dt⁒r⁒nsubscriptπ·π‘‘π‘Ÿπ‘›D_{trn}italic_D start_POSTSUBSCRIPT italic_t italic_r italic_n end_POSTSUBSCRIPT of 2000 data points from the CIFAR-10 dataset, intentionally introducing label errors in 20% of these samples by assigning them to random classes. This data size is used due to the computational complexities associated with the Influence Function (IF). We then train a ResNet-18 model for 100 epochs.We follow the same setting by calculating self-influence scores, i.e. the influence of the training point onto itself, without relying on the validation data, i.e., Dtst=Dtrnsubscript𝐷tstsubscript𝐷trnD_{\text{tst}}=D_{\text{trn}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT. After computing the scores for each training point, we sort them in descending order and then examine them for potential mislabeling in this sorted order.

Result. In Figure 5, we show the mislabeled data detection result. As shown in the figure, our proposed method can find over 80% of mislabeled samples within the first 300 checked samples. While IF would require to go through 75% of training samples to detect that many mislabeled samples, regardless of the sorting order. For the case of computing self-influence, the gradient-based methods, IF and TracIn, in fact, compute the magnitude of each point’s gradient. Thus, even though the gradients of mislabeled training points might point in the opposite direction than the gradients of the clean samples, their magnitude can be smaller than those of the clean ones, resulting in the incapability of successful mislabel detection. Our method overcomes this issue by computing the loss difference between two models, ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG and ΞΈ^+Dt⁒s⁒tsubscript^πœƒsubscript𝐷𝑑𝑠𝑑\hat{\theta}_{+D_{tst}}over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT italic_t italic_s italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, where the loss difference for clean samples will be smaller than those of the mislabeled ones, since clean samples highly likely have samples with similar labeling distribution, while the randomly mislabeled samples hardly have the support in the training dataset. Thus, they are more prone to larger loss changes. In conclusion, our approach demonstrates high mislabeled data detection performance and computation efficiency compared to existing baselines.

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (5)

4.5 Language Model Behavior Tracing

Motivation. With the growing prevalence of large language models in various applications, such as conversational agents[33, 32, 10], the importance of providing reasonable supporting evidence has become paramount. As a result, the need to trace the origin of a model’s output back to specific data samples has also become indispensable to identifying the responsible training data.Driven by this motivation, we study the task of model behavior tracing regarding factual assertion, which involves identifying the training examples responsible for inducing the model to make some factual assertion at test time.

Setup. We utilize a MT5 model[39] to finetune on the FTRACE-TREx dataset [3]. We consider each training example that conveys the same fact as a β€œproponent” of the corresponding test example and as a β€œdistractor” otherwise. We provide the details of the dataset and hyperparameter selection in AppendixD. Due to the demanding computation required by other direct train-to-test influence estimation methods, such as IF, here we only compare with TracIn.In order to make TracIn[1] more efficient, for each test sample, we limit the scope of our search to a candidate set, i.e., a subset of the entire training dataset, following previous studies [3, 26].We leverage the same evaluation metrics (i.e., precision and Mean Reciprocal Rank (MRR)) described in the previous studies [3, 26]. To consider the efficiency and the performance simultaneously, we propose a time-dependent performance metric, i.e., performance in a limited time budget. This metric is realistic because, in practice, user-facing products cannot afford to spend an indefinite amount of time responding to a user’s request.

Candidate Set Size15K20KInspected Queries
MetricMRRPrecisionMRRPrecision# of Queries/Min
TracIn (Single)0.16580.13670.15320.1300307.522
TracIn (Multi)0.15960.13000.15080.1300307.522
Forward-INF0.21010.16500.19270.15181306.323

Results. As shown in Table3, we observe that Forward-INFoutperforms the TracIn in terms of both metrics as TracIn cannot inspect enough samples within the given time.Also, counter-intuitively, the performance for TracIn drops when using multiple checkpoints compared with a single checkpoint. This finding is also reported in the previous studies[3, 26].In addition, Forward-INFis more than four times faster than TracIn, in terms of the inspected number of queries per minute, even if we calculate only one layer’s gradient for TracIn. Therefore, in the domain of large-scale models trained on vast quantities of data samples, the benefit of our method stands out. We further provide behavior-tracing experiments on paraphrased queries in AppendixE. We also provide a comparison with a simple model-independent information retrieval[29] approach in AppendixE.

5 Conclusion

Our contribution lies in the investigation of the Mirrored Influence Hypothesis. Expanding upon this hypothesis, we have developed a novel method to estimate train-to-test influence by solving the test-to-train influence problem. This approach involves evaluating the impact of incorporating a specific test set into the training set on the prediction of a training data source. Our method can be applied broadly and contribute meaningful insights across various settings. In particular, it outperforms traditional approaches that directly compute train-to-test influence by achieving an improved tradeoff between utility and efficiency.

Limitations & Future Work. The exploration of the Hypothesis unveils many avenues for future research. Although this paper excludes heuristic enhancements to Forward-INF, it is expected that strategic layer selection, and incorporating strategies to counter catastrophic forgetting[20], could enhance our method’s performance. Formulating a theoretical framework to formally validate the Hypothesis constitutes an intriguing direction for future studies.

6 Acknowledgment

We thank Hoang Anh Just and Himanshu Jahagirdar from the ReDS lab for their invaluable help in experiments and discussion.RJ and the ReDS lab acknowledge support through grants from the Amazon-Virginia Tech Initiative for Efficient and Robust Machine Learning, the National Science Foundation under Grant No. IIS-2312794, NSF IIS-2313130, NSF OAC-2239622, and the CCI SWVA Research Engagement Award.

References

  • Abramowitz and Stegun [1964]Milton Abramowitz and IreneA Stegun.Handbook of mathematical functions with formulas, graphs, and mathematical tables.US Government printing office, 1964.
  • Agarwal etal. [2017]Naman Agarwal, Brian Bullins, and Elad Hazan.Second-order stochastic optimization for machine learning in linear time.The Journal of Machine Learning Research, 18(1):4148–4187, 2017.
  • AkyΓΌrek etal. [2022]Ekin AkyΓΌrek, Tolga Bolukbasi, Frederick Liu, Binbin Xiong, Ian Tenney, Jacob Andreas, and Kelvin Guu.Tracing knowledge in language models back to the training data.arXiv preprint arXiv:2205.11482, 2022.
  • Barz and Denzler [2020]BjΓΆrn Barz and Joachim Denzler.Do we train on test data? purging cifar of near-duplicates.Journal of Imaging, 6(6):41, 2020.
  • Basu etal. [2020]Samyadeep Basu, Philip Pope, and Soheil Feizi.Influence functions in deep learning are fragile.arXiv preprint arXiv:2006.14651, 2020.
  • Cook and Weisberg [1982]RDennis Cook and Sanford Weisberg.Residuals and influence in regression.New York: Chapman and Hall, 1982.
  • Deng etal. [2009]Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei.Imagenet: A large-scale hierarchical image database.In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  • Dosovitskiy etal. [2020]Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, etal.An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020.
  • Elsahar etal. [2018]Hady Elsahar, Pavlos Vougiouklis, Arslen Remaci, Christophe Gravier, Jonathon Hare, Frederique Laforest, and Elena Simperl.T-rex: A large scale alignment of natural language with knowledge base triples.In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.
  • [10]Meta Fundamental AI Research DiplomacyTeam (FAIR)†, Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyuan Hu, etal.Human-level play in the game of diplomacy by combining language models with strategic reasoning.Science, 378(6624):1067–1074, 2022.
  • Feldman and Zhang [2020]Vitaly Feldman and Chiyuan Zhang.What neural networks memorize and why: Discovering the long tail via influence estimation.Advances in Neural Information Processing Systems, 33:2881–2891, 2020.
  • Ghorbani and Zou [2019]Amirata Ghorbani and James Zou.Data shapley: Equitable valuation of data for machine learning.In International Conference on Machine Learning, pages 2242–2251. PMLR, 2019.
  • Hammoudeh and Lowd [2022]Zayd Hammoudeh and Daniel Lowd.Training data influence analysis and estimation: A survey.arXiv preprint arXiv:2212.04612, 2022.
  • He etal. [2016]Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.Deep residual learning for image recognition.In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • Ilyas etal. [2022]Andrew Ilyas, SungMin Park, Logan Engstrom, Guillaume Leclerc, and Aleksander Madry.Datamodels: Predicting predictions from training data.arXiv preprint arXiv:2202.00622, 2022.
  • Jia etal. [2019a]Ruoxi Jia, David Dao, Boxin Wang, FrancesAnn Hubis, NeziheMerve Gurel, Bo Li, Ce Zhang, CostasJ Spanos, and Dawn Song.Efficient task-specific data valuation for nearest neighbor algorithms.arXiv preprint arXiv:1908.08619, 2019a.
  • Jia etal. [2019b]Ruoxi Jia, David Dao, Boxin Wang, FrancesAnn Hubis, Nick Hynes, NeziheMerve GΓΌrel, Bo Li, Ce Zhang, Dawn Song, and CostasJ Spanos.Towards efficient data valuation based on the shapley value.In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1167–1176. PMLR, 2019b.
  • Just etal. [2023]HoangAnh Just, Feiyang Kang, JiachenT Wang, Yi Zeng, Myeongseob Ko, Ming Jin, and Ruoxi Jia.Lava: Data valuation without pre-specified learning algorithms.arXiv preprint arXiv:2305.00054, 2023.
  • Karimi etal. [2020]Davood Karimi, Haoran Dou, SimonK Warfield, and Ali Gholipour.Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis.Medical image analysis, 65:101759, 2020.
  • Kirkpatrick etal. [2017]James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, AndreiA Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, etal.Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  • Koh and Liang [2017]PangWei Koh and Percy Liang.Understanding black-box predictions via influence functions.In International Conference on Machine Learning, pages 1885–1894. PMLR, 2017.
  • Krizhevsky etal. [2009]Alex Krizhevsky, Geoffrey Hinton, etal.Learning multiple layers of features from tiny images.2009.
  • Kwon and Zou [2021]Yongchan Kwon and James Zou.Beta shapley: a unified and noise-reduced data valuation framework for machine learning.arXiv preprint arXiv:2110.14049, 2021.
  • Liu and Nocedal [1989]DongC Liu and Jorge Nocedal.On the limited memory bfgs method for large scale optimization.Mathematical programming, 45(1-3):503–528, 1989.
  • Malladi etal. [2023]Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, JasonD Lee, Danqi Chen, and Sanjeev Arora.Fine-tuning language models with just forward passes.arXiv preprint arXiv:2305.17333, 2023.
  • Park etal. [2023]SungMin Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry.Trak: Attributing model behavior at scale.arXiv preprint arXiv:2303.14186, 2023.
  • Petroni etal. [2019]Fabio Petroni, Tim RocktΓ€schel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, AlexanderH Miller, and Sebastian Riedel.Language models as knowledge bases?arXiv preprint arXiv:1909.01066, 2019.
  • Pruthi etal. [2020]Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan.Estimating training data influence by tracing gradient descent.Advances in Neural Information Processing Systems, 33:19920–19930, 2020.
  • Robertson etal. [1995]StephenE Robertson, Steve Walker, Susan Jones, MichelineM Hanco*ck-Beaulieu, Mike Gatford, etal.Okapi at trec-3.Nist Special Publication Sp, 109:109, 1995.
  • Rombach etal. [2022]Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and BjΓΆrn Ommer.High-resolution image synthesis with latent diffusion models.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
  • Schuhmann etal. [2022]Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, etal.Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in Neural Information Processing Systems, 35:25278–25294, 2022.
  • Shuster etal. [2022]Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, EricMichael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, etal.Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage.arXiv preprint arXiv:2208.03188, 2022.
  • Thoppilan etal. [2022]Romal Thoppilan, Daniel DeFreitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, etal.Lamda: Language models for dialog applications.arXiv preprint arXiv:2201.08239, 2022.
  • Vander Maaten and Hinton [2008]Laurens Vander Maaten and Geoffrey Hinton.Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008.
  • Wang and Jia [2023a]JiachenT. Wang and Ruoxi Jia.Data banzhaf: A robust data valuation framework for machine learning.International Conference on Artificial Intelligence and Statistics, 2023a.
  • Wang and Jia [2023b]JiachenT Wang and Ruoxi Jia.Data banzhaf: A robust data valuation framework for machine learning.In International Conference on Artificial Intelligence and Statistics, pages 6388–6421. PMLR, 2023b.
  • Wang etal. [2023]Sheng-Yu Wang, AlexeiA Efros, Jun-Yan Zhu, and Richard Zhang.Evaluating data attribution for text-to-image models.arXiv preprint arXiv:2306.09345, 2023.
  • Xiang etal. [2021]Zhen Xiang, David Miller, and George Kesidis.Post-training detection of backdoor attacks for two-class and multi-attack scenarios.In International Conference on Learning Representations, 2021.
  • Xue etal. [2020]Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel.mt5: A massively multilingual pre-trained text-to-text transformer.arXiv preprint arXiv:2010.11934, 2020.
  • Zeng etal. [2021]Yi Zeng, Si Chen, Won Park, Zhuoqing Mao, Ming Jin, and Ruoxi Jia.Adversarial unlearning of backdoors via implicit hypergradient.In International Conference on Learning Representations, 2021.

Appendix A Related work

We refer the readers to a recent survey[13] for a detailed account of research in data influence estimation. In this section, we will focus on key ideas and representative prior works in this field and elaborate on their connections with our work.

Gradient-based influence estimation.

Intuitively, when models are trained with data, each training data has a unique gradient trace. Many studies have been focusing on measuring the influence of data through the lens of gradient alignment between training and test samples. Koh and Liang [21] leveraged a classic robust statistics concept, Influence Function [6], to quantify training data influence.IF evaluate the effect of infinitesimal change on the loss associated with an individual training point on the test loss. The computation of IF requires inverting the Hessian matrix, which is prohibitively expensive for modern neural networks. To tackle this challenge, Koh and Liang [21] proposed to compute IF via iteratively approximating the Hessian-vector product (HVP).Pruthi et al.[28] presented TracIn, a technique to estimate the influence of each training data by exploiting the gradient over all iterations. In particular, this influence estimator relies on the gradient of a test loss and a training loss. To scale up the approach, a practical alternative was proposed that considers a few checkpoints to calculate the gradient rather than using full iterations to approximate the data influence. Our work studies the duality associated with these gradient-based influence quantification schemes and leverages the duality to propose a more efficient alternative that does not require calculating gradients for individual training points.

Re-training based influence estimation. Re-training-based methods follow a general recipe that starts by training models on different training data subsets and then examining how the performance of these models changes when a given training point is added to the subsets. Ilyas et al. introduced Datamodel [15] which involves training thousands of models to estimate the data influence of each training datum. Specifically, this method leverages a parameterized surrogate model to predict the model performance based on the input training set and the surrogate model is learned from a training set consisting of pairs of an input subset and the corresponding model performance.Park et al. [26] proposed TRAK, which leveraged several techniques to enhance the efficiency of Datamodel. TRAK linearizes the model output function using Taylor approximation and reduces the dimensionality of the linearized model using random projections. However, it still requires repeated model training.

Another line of research does not train surrogate models for data influence estimation; instead, they directly compute a weighted average of the model performance changes in response to the addition of a training point across different subsets. Notable examples include the Shapley value[16, 12], Beta Shapley [23] and Banzhaf value [35], which differ in the design of the weighting scheme over different subsets. However, these methods face significant computational challenges for large models due to the need of retraining models on different subsets.Just et al.[18] recently proposed LAVA as a scalable solution that evaluates data influence on the model performance using optimal transport. Despite LAVA’s efficiency, LAVA does not provide a way for monitoring the training data’s contribution to the model prediction on the individual test point.

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (6)

Validation Set Size501001000
MNISTPearson0.92080.95120.9652
Spearman0.97550.97640.9942
FMNISTPearson0.92440.96110.9786
Spearman0.95680.97950.9964
CIFAR10Pearson0.69180.84010.8551
Spearman0.82740.90340.8848

Mislabeled Ratio20%30%50%
MNISTPearson0.96210.98640.9640
Spearman0.96530.99150.9907
FMNISTPearson0.94210.98570.9752
Spearman0.94790.98400.9915
CIFAR10Pearson0.75210.96970.8551
Spearman0.78380.97020.8848

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (7)

Appendix B Additional Results on Empirical Study of the Mirrored Influence Hypothesis [Section 2.1]

In this section, we delve deeper into our Mirrored Influence Hypothesis, as introduced in Section2.1, by presenting a more detailed analysis and additional results.

Analysis of correlation scores across different datasets.

We first demonstrate the validity of our hypothesis across different datasets under a near-noiseless setting and a noisy setting.

In the near-noiseless case, we train a logistic regression model using L-BFGS for 1000 iterations with L2 regularization set at 0.001. For the noisy setting, we train a convolutional neural network, consisting of two convolution layers and two MLP layers using SGD with a learning rate of 0.01, a weight decay of 0.001, and a momentum of 0.9. To increase the magnitude of influence of each data point, we select a subset of data, specifically 1050 samples from the MNIST and FMNIST datasets, and 900 samples from the CIFAR-10 dataset.

As shown in Figure6, we observe the high Pearson and Spearman correlation scores obtained across three datasets and two settings. In the near-noiseless setting, we find high correlation scores across all datasets. In the noisy setting, the CIFAR-10 results show a slight decline in correlation scores. This decrease can be attributed to the limited sample size from CIFAR-10, which potentially led to higher loss and introduced more noise in score calculation.

Analysis of correlation scores across different sizes of the validation set.

Recall the equations of Inf⁒(Diβ†’Dtst)Infβ†’subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\rightarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT β†’ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) and Inf⁒(Di←Dtst)Inf←subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\leftarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) from Section1.

Inf⁒(Diβ†’Dtst)=ℒ⁒(π’œβ’(Dtrn),Dtst)βˆ’β„’β’(π’œβ’(Dtrnβˆ–Di),Dtst).Infβ†’subscript𝐷𝑖subscript𝐷tstβ„’π’œsubscript𝐷trnsubscript𝐷tstβ„’π’œsubscript𝐷trnsubscript𝐷𝑖subscript𝐷tst\begin{aligned} \text{Inf}(D_{i}\rightarrow D_{\text{tst}})=\mathcal{L}(%\mathcal{A}(D_{\text{trn}}),D_{\text{tst}})-\mathcal{L}(\mathcal{A}(D_{\text{%trn}}\setminus D_{i}),D_{\text{tst}}).\end{aligned}start_ROW start_CELL Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT β†’ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) = caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) - caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT βˆ– italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) . end_CELL end_ROW

On the other hand, the test-to-train influence characterized by P2 can be written as

Inf⁒(Di←Dtst)=ℒ⁒(π’œβ’(DtrnβˆͺDtst),Di)βˆ’β„’β’(π’œβ’(Dtrn),Di).Inf←subscript𝐷𝑖subscript𝐷tstβ„’π’œsubscript𝐷trnsubscript𝐷tstsubscriptπ·π‘–β„’π’œsubscript𝐷trnsubscript𝐷𝑖\begin{aligned} \text{Inf}(D_{i}\leftarrow D_{\text{tst}})=\mathcal{L}(%\mathcal{A}(D_{\text{trn}}\cup D_{\text{tst}}),D_{i})-\mathcal{L}(\mathcal{A}(%D_{\text{trn}}),D_{i}).\end{aligned}start_ROW start_CELL Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) = caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT βˆͺ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) . end_CELL end_ROW

In Section2.1, when we describe our hypothesis under the noisy setting, we use a group of test points Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT to calculate Inf⁒(Di←Dtst)Inf←subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\leftarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) and Inf⁒(Diβ†’Dtst)Infβ†’subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\rightarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT β†’ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) as well as a group of training points (i.e., group-to-group influence). To evaluate the impact of the size of the validation set, we conduct experiments with varying validation set sizes while maintaining the other factors (e.g., mislabeled ratio, and hyperparameters).

Table4 shows that enlarging the validation set size (e.g., from 50 to 1000) is advantageous across all datasets. The rationale behind this observation is that a larger validation set (i.e., |Dtst|subscript𝐷tst|D_{\text{tst}}|| italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT |) yields less sensitivity in the scoring of each term (i.e., ℒ⁒(π’œβ’(Dtrn),Dtst)β„’π’œsubscript𝐷trnsubscript𝐷tst\mathcal{L}(\mathcal{A}(D_{\text{trn}}),D_{\text{tst}})caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) and ℒ⁒(π’œβ’(Dtrnβˆ–Di),Dtst)β„’π’œsubscript𝐷trnsubscript𝐷𝑖subscript𝐷tst\mathcal{L}(\mathcal{A}(D_{\text{trn}}\setminus D_{i}),D_{\text{tst}})caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT βˆ– italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) in Inf⁒(Di←Dtst)Inf←subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\leftarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ). In particular, if we have a larger validation set for Inf⁒(Diβ†’Dtst)Infβ†’subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\rightarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT β†’ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ), we remove the sensitivity in the choice of validation samples, leading to accurately estimating the effect of each group. Additionally, introducing a larger number of clean samples (i.e., Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT) to Dtrnsubscript𝐷trnD_{\text{trn}}italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT might trigger a larger difference between ℒ⁒(π’œβ’(DtrnβˆͺDtst),Di)β„’π’œsubscript𝐷trnsubscript𝐷tstsubscript𝐷𝑖\mathcal{L}(\mathcal{A}(D_{\text{trn}}\cup D_{\text{tst}}),D_{i})caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT βˆͺ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and ℒ⁒(π’œβ’(Dtrn),Di)β„’π’œsubscript𝐷trnsubscript𝐷𝑖\mathcal{L}(\mathcal{A}(D_{\text{trn}}),D_{i})caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) due to the amplified negative effect. The second row of Figure7 further illustrates that enlarging the validation set size enhances the impact of each group. In particular, as depicted by the steeper and smoother blue lines (i.e., ℒ⁒(π’œβ’(Dtrn),Dtst)β„’π’œsubscript𝐷trnsubscript𝐷tst\mathcal{L}(\mathcal{A}(D_{\text{trn}}),D_{\text{tst}})caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )) in the second of the figure, a larger clean validation set helps to mitigate the noise attributable to the stochastic nature of the learning process.

Analysis of correlation scores across different mislabeled ratios.

In our analysis, we also consider the impact of different mislabeling ratios within the training dataset (Dtrn)subscript𝐷trn(D_{\text{trn}})( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ). Table5 underscores the importance of choosing an appropriate mislabeled ratio. This factor is crucial in mitigating stochasticity by amplifying the influence of each group when analyzing correlation scores in a noisy setting. Our empirical study indicates that a mislabeling ratio exceeding 30% tends to yield less noisy results since it amplifies meaningful signals, such as clear differences between groups. As depicted in the third row of Figure7, a 20% mislabeling ratio is insufficient to reduce noise in score calculation (i.e., more fluctuation in the line of ℒ⁒(π’œβ’(Dtrnβˆ–Di),Dtst)β„’π’œsubscript𝐷trnsubscript𝐷𝑖subscript𝐷tst\mathcal{L}(\mathcal{A}(D_{\text{trn}}\setminus D_{i}),D_{\text{tst}})caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT βˆ– italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )).

It is important to avoid excessively high mislabeling ratios, like over 50%, as they can adversely affect the learning process. For example, with too many mislabeled samples, a model struggles to be effectively trained on the dataset and tends to underfit. This situation makes it difficult to differentiate signals between each group because having many mislabeled samples across different groups may yield a high loss for ℒ⁒(π’œβ’(Dtrn),Di)β„’π’œsubscript𝐷trnsubscript𝐷𝑖\mathcal{L}(\mathcal{A}(D_{\text{trn}}),D_{i})caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and ℒ⁒(π’œβ’(Dtrnβˆ–Di),Dtst)β„’π’œsubscript𝐷trnsubscript𝐷𝑖subscript𝐷tst\mathcal{L}(\mathcal{A}(D_{\text{trn}}\setminus D_{i}),D_{\text{tst}})caligraphic_L ( caligraphic_A ( italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT βˆ– italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) as shown in the third row of Figure7. This high loss in the initial stage may not only contain signals of each group’s influence but also have additional noise that prevents one from magnifying the influence of each group.

Appendix C Continual Learning vs. Training from Scratch [Section 3]

As we are considering the new objective of adding a test set Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT to the training dataset Dtrnsubscript𝐷trnD_{\text{trn}}italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT, the model trained with the new objective will be ΞΈ^Ξ΅,Dtst=arg⁑minθ⁑ℒ⁒(ΞΈ,Dtrn)+Ρ⁒ℓ⁒(ΞΈ,Dtst)subscript^πœƒπœ€subscript𝐷tstsubscriptπœƒβ„’πœƒsubscript𝐷trnπœ€β„“πœƒsubscript𝐷tst\hat{\theta}_{\varepsilon,D_{\text{tst}}}=\arg\min_{\theta}\mathcal{L}(\theta,%D_{\text{trn}})+\varepsilon\ell(\theta,D_{\text{tst}})over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT caligraphic_L ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) + italic_Ξ΅ roman_β„“ ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ). Once ΞΈ^Ξ΅,Dtstsubscript^πœƒπœ€subscript𝐷tst\hat{\theta}_{\varepsilon,D_{\text{tst}}}over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT is obtained, one can evaluate the change in the loss of individual training points:

Forward-INF⁒(Di)Forward-INFsubscript𝐷𝑖\displaystyle\text{$\texttt{Forward-INF}$~{}}(D_{i})typewriter_Forward-INF ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )=ℒ⁒(ΞΈ^Ξ΅,Dtst,Di)βˆ’β„’β’(ΞΈ^,Di)absentβ„’subscript^πœƒπœ€subscript𝐷tstsubscript𝐷𝑖ℒ^πœƒsubscript𝐷𝑖\displaystyle=\mathcal{L}(\hat{\theta}_{\varepsilon,D_{\text{tst}}},D_{i})-%\mathcal{L}(\hat{\theta},D_{i})= caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )(9)

The main challenge here is to efficiently obtain ΞΈ^Ξ΅,Dtstsubscript^πœƒπœ€subscript𝐷tst\hat{\theta}_{\varepsilon,D_{\text{tst}}}over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT from ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG.Note that for any ΞΈπœƒ\thetaitalic_ΞΈ that is close to ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG, we have

ℒ⁒(ΞΈ,Dtrn)β„’πœƒsubscript𝐷trn\displaystyle\mathcal{L}(\theta,D_{\text{trn}})caligraphic_L ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT )+Ρ⁒ℓ⁒(ΞΈ,Dtst)πœ€β„“πœƒsubscript𝐷tst\displaystyle+\varepsilon\ell(\theta,D_{\text{tst}})+ italic_Ξ΅ roman_β„“ ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )
=\displaystyle==ℒ⁒(ΞΈ^,Dtrn)+Ρ⁒ℓ⁒(ΞΈ^,Dtst)β„’^πœƒsubscript𝐷trnπœ€β„“^πœƒsubscript𝐷tst\displaystyle\,\mathcal{L}(\hat{\theta},D_{\text{trn}})+\varepsilon\ell(\hat{%\theta},D_{\text{tst}})caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) + italic_Ξ΅ roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )
+(ΞΈβˆ’ΞΈ^)β‹…βˆ‡ΞΈ[ℒ⁒(ΞΈ,Dtrn)+Ρ⁒ℓ⁒(ΞΈ,Dtst)]|ΞΈ=ΞΈ^evaluated-atβ‹…πœƒ^πœƒsubscriptβˆ‡πœƒβ„’πœƒsubscript𝐷trnπœ€β„“πœƒsubscript𝐷tstπœƒ^πœƒ\displaystyle+(\theta-\hat{\theta})\cdot\left.\nabla_{\theta}[\mathcal{L}(%\theta,D_{\text{trn}})+\varepsilon\ell(\theta,D_{\text{tst}})]\right|_{\theta=%\hat{\theta}}+ ( italic_ΞΈ - over^ start_ARG italic_ΞΈ end_ARG ) β‹… βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT [ caligraphic_L ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) + italic_Ξ΅ roman_β„“ ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) ] | start_POSTSUBSCRIPT italic_ΞΈ = over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT
+π’ͺ⁒(β€–ΞΈβˆ’ΞΈ^β€–22)π’ͺsubscriptsuperscriptnormπœƒ^πœƒ22\displaystyle+\mathcal{O}(\|\theta-\hat{\theta}\|^{2}_{2})+ caligraphic_O ( βˆ₯ italic_ΞΈ - over^ start_ARG italic_ΞΈ end_ARG βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
β‰ˆ\displaystyle\approxβ‰ˆβ„’β’(ΞΈ^,Dtrn)+Ρ⁒ℓ⁒(ΞΈ^,Dtst)β„’^πœƒsubscript𝐷trnπœ€β„“^πœƒsubscript𝐷tst\displaystyle\,\mathcal{L}(\hat{\theta},D_{\text{trn}})+\varepsilon\ell(\hat{%\theta},D_{\text{tst}})caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) + italic_Ξ΅ roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )
+(ΞΈβˆ’ΞΈ^)β‹…[βˆ‡ΞΈβ„’β’(ΞΈ,Dtrn)+Ξ΅β’βˆ‡ΞΈβ„“β’(ΞΈ,Dtst)]|ΞΈ=ΞΈ^evaluated-atβ‹…πœƒ^πœƒdelimited-[]subscriptβˆ‡πœƒβ„’πœƒsubscript𝐷trnπœ€subscriptβˆ‡πœƒβ„“πœƒsubscript𝐷tstπœƒ^πœƒ\displaystyle+(\theta-\hat{\theta})\cdot\left.[\nabla_{\theta}\mathcal{L}(%\theta,D_{\text{trn}})+\varepsilon\nabla_{\theta}\ell(\theta,D_{\text{tst}})]%\right|_{\theta=\hat{\theta}}+ ( italic_ΞΈ - over^ start_ARG italic_ΞΈ end_ARG ) β‹… [ βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT caligraphic_L ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) + italic_Ξ΅ βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) ] | start_POSTSUBSCRIPT italic_ΞΈ = over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT(10)
β‰ˆ\displaystyle\approxβ‰ˆβ„’β’(ΞΈ^,Dtrn)+Ρ⁒ℓ⁒(ΞΈ^,Dtst)β„’^πœƒsubscript𝐷trnπœ€β„“^πœƒsubscript𝐷tst\displaystyle\,\mathcal{L}(\hat{\theta},D_{\text{trn}})+\varepsilon\ell(\hat{%\theta},D_{\text{tst}})caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) + italic_Ξ΅ roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )
+(ΞΈβˆ’ΞΈ^)β‹…Ξ΅β’βˆ‡ΞΈβ„“β’(ΞΈ,Dtst)|ΞΈ=ΞΈ^evaluated-atβ‹…πœƒ^πœƒπœ€subscriptβˆ‡πœƒβ„“πœƒsubscript𝐷tstπœƒ^πœƒ\displaystyle+(\theta-\hat{\theta})\cdot\varepsilon\left.\nabla_{\theta}\ell(%\theta,D_{\text{tst}})\right|_{\theta=\hat{\theta}}+ ( italic_ΞΈ - over^ start_ARG italic_ΞΈ end_ARG ) β‹… italic_Ξ΅ βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ΞΈ = over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT(11)

where the first approximation holds for Ξ΅β†’0β†’πœ€0\varepsilon\rightarrow 0italic_Ξ΅ β†’ 0 and continuity of the loss function and the second approximation holds for βˆ‡ΞΈβ„’β’(ΞΈ,Dtrn)|ΞΈ=ΞΈ^=0evaluated-atsubscriptβˆ‡πœƒβ„’πœƒsubscript𝐷trnπœƒ^πœƒ0\left.\nabla_{\theta}\mathcal{L}(\theta,D_{\text{trn}})\right|_{\theta=\hat{%\theta}}=0βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT caligraphic_L ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ΞΈ = over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT = 0 for ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG being the minimizer of ℓ⁒(ΞΈ,Dtrn)β„“πœƒsubscript𝐷trn\ell(\theta,D_{\text{trn}})roman_β„“ ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) by definition. Taking the arg⁑min\arg\minroman_arg roman_min on both sides of (10) yields

ΞΈ^Ξ΅,Dtstsubscript^πœƒπœ€subscript𝐷tst\displaystyle\hat{\theta}_{\varepsilon,D_{\text{tst}}}over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT=arg⁑minθ⁑[ℒ⁒(ΞΈ,Dtrn)+Ρ⁒ℓ⁒(ΞΈ,Dtst)]absentsubscriptπœƒβ„’πœƒsubscript𝐷trnπœ€β„“πœƒsubscript𝐷tst\displaystyle=\arg\min_{\theta}[\mathcal{L}(\theta,D_{\text{trn}})+\varepsilon%\ell(\theta,D_{\text{tst}})]= roman_arg roman_min start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT [ caligraphic_L ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) + italic_Ξ΅ roman_β„“ ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) ]
β‰ˆargminΞΈ[β„’(ΞΈ^,Dtrn)+Ξ΅β„“(ΞΈ^,Dtst)\displaystyle\approx\arg\min_{\theta}\Big{[}\mathcal{L}(\hat{\theta},D_{\text{%trn}})+\varepsilon\ell(\hat{\theta},D_{\text{tst}})β‰ˆ roman_arg roman_min start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT [ caligraphic_L ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT trn end_POSTSUBSCRIPT ) + italic_Ξ΅ roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )
+(ΞΈβˆ’ΞΈ^)β‹…Ξ΅βˆ‡ΞΈβ„“(ΞΈ,Dtst)|ΞΈ=ΞΈ^]\displaystyle\quad+(\theta-\hat{\theta})\cdot\varepsilon\left.\nabla_{\theta}%\ell(\theta,D_{\text{tst}})\right|_{\theta=\hat{\theta}}\Big{]}+ ( italic_ΞΈ - over^ start_ARG italic_ΞΈ end_ARG ) β‹… italic_Ξ΅ βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ΞΈ = over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT ](12)
=arg⁑minθ⁑(ΞΈβˆ’ΞΈ^)β‹…Ξ΅β’βˆ‡ΞΈβ„“β’(ΞΈ,Dtst)|ΞΈ=ΞΈ^absentevaluated-atβ‹…subscriptπœƒπœƒ^πœƒπœ€subscriptβˆ‡πœƒβ„“πœƒsubscript𝐷tstπœƒ^πœƒ\displaystyle=\arg\min_{\theta}(\theta-\hat{\theta})\cdot\varepsilon\left.%\nabla_{\theta}\ell(\theta,D_{\text{tst}})\right|_{\theta=\hat{\theta}}= roman_arg roman_min start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( italic_ΞΈ - over^ start_ARG italic_ΞΈ end_ARG ) β‹… italic_Ξ΅ βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ΞΈ = over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT(13)

Note that to find the minimum of (12), one needs to search along βˆ‡ΞΈβ„“β’(ΞΈ,Dtst)|ΞΈ=ΞΈ^evaluated-atsubscriptβˆ‡πœƒβ„“πœƒsubscript𝐷tstπœƒ^πœƒ\nabla_{\theta}\ell(\theta,D_{\text{tst}})|_{\theta=\hat{\theta}}βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ΞΈ = over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT, i.e., the current gradient direction at the test sample.Thus, to find ΞΈ^Ξ΅,Dtstsubscript^πœƒπœ€subscript𝐷tst\hat{\theta}_{\varepsilon,D_{\text{tst}}}over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT, given that Ο΅β†’0β†’italic-Ο΅0\epsilon\rightarrow 0italic_Ο΅ β†’ 0 and (ΞΈ^Ξ΅,Dtstβˆ’ΞΈ^)subscript^πœƒπœ€subscript𝐷tst^πœƒ(\hat{\theta}_{\varepsilon,D_{\text{tst}}}-\hat{\theta})( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT - over^ start_ARG italic_ΞΈ end_ARG ) is small assuming continuity of the loss function, one only needs to continually update the trained model ΞΈ^^πœƒ\hat{\theta}over^ start_ARG italic_ΞΈ end_ARG on the given test sample Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT.

In standard practice, Ξ΅πœ€\varepsilonitalic_Ξ΅ is often considered to be positive as it represents the model being trained on the test sample (i.e., Ξ΅β†’0+β†’πœ€superscript0\varepsilon\rightarrow 0^{+}italic_Ξ΅ β†’ 0 start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT).However, if the test sample is drawn from a similar distribution as the training data (i.e., βˆ‡ΞΈβ„“β’(ΞΈ,Dtst)|ΞΈ=ΞΈ^evaluated-atsubscriptβˆ‡πœƒβ„“πœƒsubscript𝐷tstπœƒ^πœƒ\left.\nabla_{\theta}\ell(\theta,D_{\text{tst}})\right|_{\theta=\hat{\theta}}βˆ‡ start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT roman_β„“ ( italic_ΞΈ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ΞΈ = over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT, the magnitude of the gradient is small.Interestingly, and counter-intuitively, we can estimate data influence effectively with impressive accuracy by setting Ξ΅πœ€\varepsilonitalic_Ξ΅ to be negative and employing the gradient ascent on the test sample. Namely, we define the following mirrored metric

Forward-INF⁒(Di):=ℓ⁒(ΞΈ^Ξ΅,Dtst,Di)βˆ’β„“β’(ΞΈ^,Di)Ξ΅|Ξ΅β†’0βˆ’assignForward-INFsubscript𝐷𝑖evaluated-atβ„“subscript^πœƒπœ€subscript𝐷tstsubscript𝐷𝑖ℓ^πœƒsubscriptπ·π‘–πœ€β†’πœ€superscript0\text{Forward-INF}(D_{i}):=\left.\frac{\ell(\hat{\theta}_{\varepsilon,D_{\text%{tst}}},D_{i})-\ell(\hat{\theta},D_{i})}{\varepsilon}\right|_{\varepsilon%\rightarrow 0^{-}}Forward-INF ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) := divide start_ARG roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT italic_Ξ΅ , italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - roman_β„“ ( over^ start_ARG italic_ΞΈ end_ARG , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_Ξ΅ end_ARG | start_POSTSUBSCRIPT italic_Ξ΅ β†’ 0 start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT(14)

This technique successfully circumvents numerical issues from diminished gradients on well-trained models and remarkably enhances the accuracy of influence estimation.

Appendix D Hyperparameter Details [Section 4]

As our approach is based on a gradient ascent (i.e., maximization), selecting appropriate hyperparameters (e.g., learning rate, the number of epochs) is essential. We note that the details of the experiment setting for Section2.1 are elaborated in SectionB. In this section, we mainly focus on presenting the details of application experiments from Section4.

Data influence estimation in diffusion models [Section4.1].

In this experiment, it is necessary to fine-tune a stable diffusion pre-trained model on a set of test samples. To fine-tune the pre-trained stable diffusion model, we follow the same setting as that from the previous work [37]. In particular, we randomly pick fine-tuning exemplars from ImageNet[7] and further train the pre-trained stable diffusion model on the selected exemplars with 5 iterations, a learning rate of 1e-5, and a batch size of 4. After generating the synthesized samples, we perform gradient ascent on a set of synthesized samples with the same learning rate and 3 iterations.

Data leakage detection [Section4.2].

We train both ResNet-18 and ResNet-50 using the Adam optimizer with a learning rate of 0.01 and 200 iterations for both CIFAR-10 and CIFAR-100[22].Due to the computational complexity of our baselines ( IF, TracIN ), we randomly selected 20 test samples for evaluation.Within this study, we first assign 20% of the total test samples specifically for the purpose of hyperparameter selection, while the remaining portion is dedicated to evaluation.

Model behavior tracing [section4.5].

In this study, we leverage the FTRACE-TREX dataset [3] for the model behavior tracing task.The FTRACE-TREX dataset consists of a set of β€œabstracts” and a set of β€œqueries”, and each query is annotated with the corresponding list of fact samples [3].The training set of FTRACE-TREX is sourced from the TREX dataset [9], and the test set of the FTRACE-TREx dataset is derived from the LAMA dataset [27].Every training example that conveys the identical information as the given test example is designated as a ”proponent.”We randomly sampled 100 data points for this task and took an average of three repeated experiments.

Appendix E Further Analysis and Details

Model/DatasetMetricIF-100Forward-INF
ResNet-18T-10.0000.880
ImageNet-100T-50.0000.880

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (8)

E.1 Analysis of Correlation between Forward-INF(Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) and Inf⁒(Di←Dtst)Inf←subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\leftarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT )

Validation Set Size50100500
MNISTPearson0.97010.98190.9951
Spearman0.97460.97860.9942
FMNISTPearson0.94500.98110.9942
Spearman0.94790.97950.9920
CIFAR10Pearson0.90500.95280.9862
Spearman0.90520.94130.9832

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (9)

We want to demonstrate that our influence score approximator, Forward-INF, can effectively estimate the influence score of Inf⁒(Di←Dtst)Inf←subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\leftarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ), thereby obviating the need for re-training since it will be the equivalent influence information as Inf⁒(Diβ†’Dtst)Infβ†’subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\rightarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT β†’ italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ). For this experiment, we chose the noisy setting and utilized the same groups as in previous tests under the noisy setting. We performed gradient ascent with respect to a group of test samples Dtstsubscript𝐷tstD_{\text{tst}}italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT and then computed the loss difference between the original and the unlearned models for different groups. We used the SGD optimizer for gradient ascent with a learning rate of 0.01, a weight decay of 0.001, and two iterations.

Table7 presents the correlation scores between Forward-INFand Inf⁒(Di←Dtst)Inf←subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\leftarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ). As indicated in the table, there is a consistently high correlation between our proposed method and Inf⁒(Di←Dtst)Inf←subscript𝐷𝑖subscript𝐷tst\text{Inf}(D_{i}\leftarrow D_{\text{tst}})Inf ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT tst end_POSTSUBSCRIPT ). Similarly, we observe that increasing the size of the validation set results in higher correlation scores, attributed to reduced noise in the learning process. We provide the correlation plots in Figure9.

E.2 Data Influence Estimation in Diffusion Model [Section 4.1]

In this section, we present both quantitative and qualitative results of data influence estimation for diffusion models. As delineated in [37], since the fine-tuning examples can serve as the noisy ground truth, we can quantitatively measure whether Forward-INFcan identify the ground-truth sample as the most influential sample (Top-1).In our experiment, we randomly sample 20 different objects from the ImageNet and also construct the different sizes (e.g., 100, 1K, 10K, 20K) of candidate sets. We follow the same candidate set creation process as we described in Section4.1. If Forward-INFcorrectly identifies the ground truth as the most influential point, we mark it as success and average over 20 points. The detection results are presented in Table8. Our observations reveal that Forward-INFcorrectly identifies the ground truth fine-tuning examples regardless of its candidate set sizes, demonstrating its robustness across varying candidate set sizes.

In addition, we provide the qualitative results in Figure10. In this experiment, we utilize a candidate set comprising 20K samples and present the top 8 highest influential training samples retrieved by Forward-INF. Most of the retrieved samples share similar features with each other either on an image or caption side. This indicates that the unlearning process mostly affects the samples that share similar features.Given that the candidate set is curated selectively rather than utilizing the entire dataset, there’s a possibility that it might not include more than top-k captions precisely identical to those of the ground truth samples. In this case, Forward-INFmight yield some samples that are not directly relevant. However, the crucial observation is that, despite the presence of candidate samples exhibiting similar features to the ground truth in both image and caption aspects, Forward-INFcan accurately identify the ground truth as the most influential training data, highlighting its efficacy in pinpointing the key training influences in the generation of synthesized images.

Applying other influence approximators, such as IF and TracIn, to large-scale diffusion models poses a challenge in terms of efficiency, as delineated in [37]. Specifically, the IF needs to approximate the inverse Hessian matrix as well as calculate gradients with respect to training and test samples. However, this approach becomes problematic when it comes to the large-scale stable diffusion model (e.g., around 860 million parameters) and a large amount of training data used (e.g., around 2 billion samples) [37]. On the other hand, TracIn requires computing the dot product between gradients with respect to every training and test sample, leading to the scalability problem for large-scale models. Therefore, as also mentioned in a previous study [3], for the large-scale language model, one selects the first layer, and for the classification model, the last layer has been widely selected. However, both baselines have not been widely explored and applied to the diffusion models to achieve high efficiency and performance.Therefore, addressing these scalability challenges within such baselines will be deferred to future research.

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (10)
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (11)

Candidate Set Size1001K10K20K
T-1 Detection1.0001.0001.0001.000

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (12)

E.3 Data Leakage Detection [Section 4.2]

We further extend our experiment of data leakage detection on ImageNet-100 to validate the practicality as well as scalability. We observe a high detection rate (e.g., 88% top-1 detection rate) for the RN18 classifier trained on the ImageNet-100. As the model complexity increases, the detection rate slightly decreases, considering the previous results on CIFAR-10 and CIFAR-100. We hypothesize that the unlearning process affects more diverse neurons and their connections when larger and more complex datasets are used, leading to a drop in performance. However, this performance is still considered high, compared with our baseline such as IF which completely fails as shown in Table 6.

Unlearning dynamics.

We additionally conduct a feature embedding analysis to visualize the gradient ascent dynamics in our method. In Figure 8, we present the feature space change according to the maximization steps.Interestingly, we observe that the duplicated sample (i.e., top-1) is strongly impacted by the gradient ascent process, experiencing a deviation from the class cluster during the process of unlearning. The figure in the bottom row shows examples that receive top-5 scores out of all training samples, indicating that our approach correctly identifies both the exact duplicated sample as well as relevant near duplicates. This observation supports our intuition that if we unlearn one specific point, we could expect that class or sample-related points would receive a high training loss change, thereby receiving a high Forward-INFscore.

Senstivity analysis of hyperparameter selection.

Considering that the unlearning process may involve sensitivity to hyperparameters (e.g., learning rate, and the number of iterations), our method initially selects the hyperparameter based on a holdout validation set. Consequently, we also present the results concerning the sensitivity of the validation set size. As illustrated in Table 9, Forward-INFstill achieves 100% detection accuracy for our approach even with only 25 validation samples, indicating that our approach exhibits low sensitivity to changes in the size of the validation set.

ModelDatasetMetric5% [25 samples]10% [50 samples]20% [100 samples]30% [150 samples]
ResNet-18CIFAR-100T-11.0001.0001.0001.000
ResNet-18CIFAR-100T-51.0001.0001.0001.000

E.4 Memorization Analysis [Section 4.3]

We provide extensive qualitative results for memorization analysis in Figure11. As we can observe from the figure, our Forward-INFcan effectively retrieve the most influential point with a much lower cost than the previous work[11]. In addition, based on qualitative results, we consistently observe that the high-influential pairs are exact or near duplicated samples and benefit the most from memorization (see the corresponding memorization score from the figure).

E.5 Model Behavior Tracing [Section 4.5]

In this section, we delve deeper into the problem of model behavior tracing, expanding the scope of our discussion. In reality, it becomes inherently challenging to locate direct supporting samples from the training data that precisely correspond to a given test sample. Instead, the relationships between training samples and generating answers for the test example often involve more indirect correlations. In light of this, we introduce an additional experimental design to explore a scenario wherein the test query undergoes paraphrasing. The key question is whether we can still identify the relevant ground-truth samples from the training data.

As demonstrated in Table10, our proposed approach, Forward-INF, consistently displays favorable performance, albeit with a minor drop, while outperforming the baseline TracIn in terms of MRR and Precision metrics for the 15K candidate set size.

Candidate Set SizeMetric15kInspected Queries
MRRΔΔ\Deltaroman_Ξ”PrecisionΔΔ\Deltaroman_Ξ”# of Queries/Min
TracIn (single)0.1868-0.1549-396.125
Forward-INF0.20310.01630.15510.00021289.475

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (13)

Comparison with simple model-independent information retrieval baseline[29].

BM25 is a standard information retrieval technique that selects proponents by retrieving training examples with high lexical overlap with the query. BM5 has been adopted to trace facts in[3]. However, since the influence scores derived from BM25 do not incorporate information pertinent to the specific model under consideration, it cannot be used to explain why a given model makes some factual assertions.For instance, when our LLM produces an incorrect prediction resulting in a high loss, we want to trace back to the contributing examples that lead to this misjudgment for the sake of transparency and interpretability. While the performance of BM25 in identifying pertinent facts is promising [3], it falls short in shedding light on the intricate model’s behavior (i.e., it only provides the related facts but does not explain why a model outputs high loss). Conversely, our proposed method is adept at elucidating the underlying reasons behind LLM’s inaccurate answers.For example, as shown in Figure 13, the top-1 pair retrieved by our method includes the target label β€œFrance.” However, when considering the overall meaning of the sentence, it indicates a different semantical meaning.This observation implies the presence of semantic conflicts (i.e., some samples share the same labels but entail different semantical meanings) within the training samples, consequently confusing the model’s learning process.

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes (2024)
Top Articles
Latest Posts
Article information

Author: Eusebia Nader

Last Updated:

Views: 5963

Rating: 5 / 5 (80 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Eusebia Nader

Birthday: 1994-11-11

Address: Apt. 721 977 Ebert Meadows, Jereville, GA 73618-6603

Phone: +2316203969400

Job: International Farming Consultant

Hobby: Reading, Photography, Shooting, Singing, Magic, Kayaking, Mushroom hunting

Introduction: My name is Eusebia Nader, I am a encouraging, brainy, lively, nice, famous, healthy, clever person who loves writing and wants to share my knowledge and understanding with you.