Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add counterfactual analysis for regression models (What-If Tool) #2647

Merged
merged 11 commits into from
Sep 20, 2019

Conversation

grovina
Copy link
Contributor

@grovina grovina commented Sep 12, 2019

  • Motivation for features / changes

Add the ability to show nearest counterfactual example in regression models.

  • Technical description of changes

The feature and its use are analogous to the existing counterfactual analysis available in non-regression models.

There is one fundamental difference in the way we compare inference classes. In binary classification, the counterfactual examples are the ones belonging to the alternative class. In regression, we will define them as the ones whose inferred values are farther from the original example than a given delta, which will be defined by the user.

For example, if delta is set to 2 and the inferred value of the selected example is 7, then we'll be looking for most similar example with an inferred value outside of the (5, 9) interval.

  • Screenshots of UI changes

For regression only:
Screenshot 2019-09-11 at 20 51 07

  • Detailed steps to verify changes work correctly (as executed by you)
  1. Open binary classification problem and check that nothing's changed. (multi_demo)
  2. Open regression classification problem (age_demo) and wait for data to load, then:
    • select an example
    • enable "Show nearest counterfactual datapoint"
    • play around with Delta and distance
    • repeat for some other datapoints

Plotting Distance to datapoint vs Inference value also helps to confirm that the counterfactual found is reasonable.

  • Alternate designs / implementations considered

Some other ways of defining the nearest counterfactual example:

  1. Define separate upper and lower deltas for the factual interval.
  2. Within a given distance in the feature space, find the datapoint with highest difference in inference value.
  3. Finding the datapoint with the highest ratio between difference in inference value and distance in features space.

These alternative ways could be added in the future.

cc @jameswex @tolga-b

When computing closest counterfactuals, we need to evaluate if two
datapoints are in the same class or not.
For regression, this is more than ordinary comparison. Let's move this
evaluation to a separate method to be able to keep the logics clear.
For regression, we add a way to specify the distance in inferred value
above which the points can be considered counterfactual.
Also adding a small explanation in the information dialog.
Set the maximum slider value for the datapoint when calculating
nearest counterfactuals.
Since the counterfactual threshold is based on the distance between
inferred values, we calculate the maximum value distance between
the selected point and the others.
Ignore the example itself when looking for counterfactuals, since a 0
threshold in regression would mean that the closest counterfactual is
always the example itself, which gives no information.
By skipping the example, the 0 threshold means looking for the closest
[different] example, regardless of the difference in inferred values.
@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@grovina grovina force-pushed the grovina-wit-reg-counterfactual branch from e54b2df to e5ed88e Compare September 12, 2019 10:41
@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

Copy link
Contributor

@jameswex jameswex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this out so quickly! Some small comments.

Instead of 0, use the standard deviation of the inferred values as the
initial value for the counterfactual delta.
@jameswex
Copy link
Contributor

@stephanwlee please review.

adjustMaxCounterfactualValueDist_: function(selected, valueStr) {
this.maxCounterfactualValueDist = Math.max(
this.distanceStats_[valueStr].max -
this.visdata[selected][valueStr],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not know about the nature of distanceStats and visData: can all of these values be negative? Can visData be lower than both max and min of the distanceStats? Can this be doing Math.max on, for example, Math.max(-2, -3)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distanceStats_ is based on the visData data, so that:
distanceStats_[valueStr].min <= visdata[selected][valueStr] <= distanceStats_[valueStr].max

This Math.max op then receives only non-negative values.

Addressing PR conversations.
@grovina grovina requested a review from stephanwlee September 17, 2019 10:49
@jameswex jameswex merged commit b7b68b1 into tensorflow:master Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants