Meaning of x-tilde and "importance score linear wrt the black-box" #2

dennysemko · 2023-05-24T21:50:46Z

Hello,
I am trying to understand the auxiliary function trick for importance scores linear wrt the black-box model.
In the paper it boils down to this:

$$b_i(f,x) \equiv a_i(g_x,x)$$

$$g_x = \sum_{j=1}^{d_H}f_j(x) \cdot f_j(\tilde{x})$$

From the text I don't quite get what the meaning of $\tilde{x}$ is.

What is effectively done when one wants to use a feature importance attribution method on an embedding network $f$ for a specific sample/image $x$?
We calculate the dot product of the vector $f(x)$ with itself and apply the method to that scalar output (i.e. $x = \tilde{x}$)? Or does the $\tilde{x}$ come from somewhere else and we do $f(x) \cdot f( \tilde{x})$?

Also, could you please point to literature or elaborate on what it means for a feature importance score to be "linear with respect to the black-box"?

I would really appreciate an answer!
Kind regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meaning of x-tilde and "importance score linear wrt the black-box" #2

Meaning of x-tilde and "importance score linear wrt the black-box" #2

dennysemko commented May 24, 2023

Meaning of x-tilde and "importance score linear wrt the black-box" #2

Meaning of x-tilde and "importance score linear wrt the black-box" #2

Comments

dennysemko commented May 24, 2023