forked from KaryFramling/ciu
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
408 lines (290 loc) · 21.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
---
title: "README"
author: "Kary Främling"
date: "`r Sys.Date()`"
output:
md_document:
variant: gfm
<!---output: "html_notebook--->"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# ciu
This is an R implementation of the Contextual Importance and Utility (CIU) method for Explainable AI (XAI). CIU was developed by Kary Främling in his PhD thesis *Learning and Explaining Preferences with Neural Networks for Multiple Criteria Decision Making*, (written in French, title *Modélisation et apprentissage des préférences par réseaux de neurones pour l'aide à la décision multicritère*), available online for instance here: <https://tel.archives-ouvertes.fr/tel-00825854/document>. It was originally implemented in Matlab.
# What is CIU?
**Remark**: It seems like Github Markdown doesn't show correctly the "{" and "}" characters in Latex equations, whereas they are shown correctly in Rstudio. Therefore, in most cases where there is an $i$ shown in Github, it actually signifies `{i}` and where there is an $I$ it signifies `{I}`.
CIU is a model-agnostic method for producing outcome explanations of results of any "black-box" model `y=f(x)`. CIU directly estimates two elements of explanation by observing the behaviour of the black-box model (without creating any "surrogate" model `g` of `f(x)`).
**Contextual Importance (CI)** answers the question: *how much can the result (or the utility of it) change as a function of feature* $i$ or a set of features $\{i\}$ jointly, in the context $x$?
**Contextual Utility (CU)** answers the question: *how favorable is the value of feature* $i$ (or a set of features $\{i\}$ jointly) for a good (high-utility) result, in the context $x$?
CI of one feature or a set of features (jointly) $\{i\}$ compared to a superset of features $\{I\}$ is defined as
$$
\omega_{j,\{i\},\{I\}}(x)=\frac{umax_{j}(x,\{i\})-umin_{j}(x,\{i\})}{umax_{j}(x,\{I\})-umin_{j}(x,\{I\})},
$$
where $\{i\} \subseteq \{I\}$ and $\{I\} \subseteq \{1,\dots,n\}$. $x$ is the instance/context to be explained and defines the values of input features that do not belong to $\{i\}$ or $\{I\}$. In practice, CI is calculated as:
$$
\omega_{j,\{i\},\{I\}}(x)= \frac{ymax_{j,\{i\}}(x)-ymin_{j,\{i\}}(x)}{ ymax_{j,\{I\}}(x)-ymin_{j,\{I\}}(x)},
$$
where $ymin_{j}()$ and $ymax_{j}()$ are the minimal and maximal $y_{j}$ values observed for output $j$.
CU is defined as
$$
CU_{j,\{i\}}(x)=\frac{u_{j}(x)-umin_{j,\{i\}}(x)}{umax_{j,\{i\}}(x)-umin_{j,\{i\}}(x)}.
$$
When $u_{j}(y_{j})=Ay_{j}+b$, this can be written as:
$$
CU_{j,\{i\}}(x)=\left|\frac{ y_{j}(x)-yumin_{j,\{i\}}(x)}{ymax_{j,\{i\}}(x)-ymin_{j,\{i\}}(x)}\right|,
$$
where $yumin=ymin$ if $A$ is positive and $yumin=ymax$ if $A$ is negative.
# Classification Example: Titanic
The Titanic data set is a classification task with classes `yes' or`no' for the probability of survival. Our Random Forest model achieved 81.1% classification accuracy on the test set. The studied instance \`Johnny D' (an 8-year old boy traveling alone) and model are the same as used in *Przemyslaw Biecek and Tomasz Burzykowski. Explanatory Model Analysis. Chapman and Hall/CRC, New York, 2021*.
First load necessary packages:
```{r loadlibs, message=FALSE, results='hide'}
# Necessary packages
library(ggplot2)
library(randomForest)
library(caret)
library("DALEX")
```
Train Random Forest model:
```{r}
library(ciu)
# Ensure some repeatability.
set.seed(32)
# Some pre-processing
titanic_train <- titanic[,c("survived", "class", "gender", "age", "sibsp", "parch", "fare", "embarked")]
titanic_train$survived <- factor(titanic_train$survived)
titanic_train$gender <- factor(titanic_train$gender)
titanic_train$embarked <- factor(titanic_train$embarked)
titanic_train <- na.omit(titanic_train)
# Train with caret/Random Forest (we don't care about train/test set for this example)
kfoldcv <- caret::trainControl(method="cv", number=10)
model_rf <- caret::train(survived ~ ., titanic_train, method="rf", trControl=kfoldcv)
```
Create instance, CIU object and explain.
```{r}
# Create test instance (8-year old boy)
new_passenger <- data.frame(
class = factor("1st", levels = c("1st", "2nd", "3rd", "deck crew", "engineering crew", "restaurant staff", "victualling crew")),
gender = factor("male", levels = c("female", "male")),
age = 8,
sibsp = 0,
parch = 0,
fare = 72,
embarked = factor("Cherbourg", levels = c("Belfast", "Cherbourg", "Queenstown", "Southampton"))
)
# Create CIU object and get barplot explanation
ciu <- ciu.new(model_rf, survived~., titanic_train)
p <- ciu$ggplot.col.ciu(new_passenger); print(p)
```
This CIU bar plot explanation "explains" the probability of survival, which is 63.6%, as well as the probability of non-survival. The bar lengths show she CI value and the bar color corresponds to the CU value.
The `ciu` package provides numerous visualization possibilities. The plot above is produced using `ggplot` but there are also methods/functions that use R's standard (old) plot functionality. In practice, it seems like a more clearly "counter-factual" explanation is more easily understood, as produced by the following code:
```{r}
print(ciu$ggplot.col.ciu(new_passenger, output.names = "yes", plot.mode = "overlap"))
```
In this plot, the transparent bar shows the CI value, *i.e.* how much the result could change with a different value than the current one. The solid part shows how "good" the current value is (CU). This explanation can be considered counter-factual (what-if) because it *e.g.*shows that being accompanied by even one parent would significantly increase the probability of survival (the feature $parch$).
For one input feature, it is easy to see exactly how both CI and CU values are calculated, as shown by the following call and plot (the red dot shows the value for instance $x$):
```{r titanic_age}
print(ciu$ggplot.ciu(new_passenger, ind.input = 3, ind.output = 2, illustrate.CIU = TRUE))
```
It is also possible to obtain textual explanations:
```{r}
cat(ciu$textual(new_passenger, ind.output = 2, use.text.effects = TRUE))
```
Raw CIU values (for ine feature/input) can be obtained by the "explain" function:
```{r}
ciu$explain(new_passenger, ind.inputs.to.explain = 3)
```
The `meta.explain` method returns an "explanation object" that can also be passed as a parameter to the plotting/textual functions in order to evaluate the explanation only once:
```{r meta_explain_titanic}
mciu <- ciu$meta.explain(new_passenger)
# Display result values for all inputs
print(mciu$ciuvals)
# More practical to use ciu.list.to.frame (out.ind=2 corresponds to "yes" for Titanic):
ciu.list.to.frame(mciu$ciuvals, out.ind = 2)
```
## Contextual influence
The yellow/orange $y(u(0))$ line shown in the plot above is related to what is called Contextual influence and can be calculated from CI and CU as follows:
$$
\phi_{j,\{i\},\{I\}}(x)=\omega_{j,\{i\},\{I\}}(x)(CU_{j,\{i\}}(x) - \phi_{0}),
$$
where $\phi_{0}$ is the *baseline/reference* value ($y(u(0))$ in the plot). For instance, $\phi_{0}=0.5$ signifies using the average utility value $0.5$ as the baseline, which is the case in the $age$ plot above. An explanation using Contextual influence can be obtained as follows:
```{r contextual_influence_titanic, message=FALSE, warning=FALSE}
print(ciu$ggplot.col.ciu(new_passenger, output.names = "yes", use.influence = TRUE) +
scale_fill_gradient(low="firebrick", high="steelblue") +
labs(x ="", y = expression(phi)))
```
**Remark:** The Equation for Contextual influence is similar to the definition of Shapley values for linear models, except that the input value $x_{i}$ is replaced by its utility value(s) $CU_{j,\{i\}}(x)$. In practice, **all *Additive Feature Attribution (AFA)* methods estimate influence values, not feature importance. Most state-of-the-art methods such as *Shapley values*, *LIME*,** \etc are AFA methods.
Influence values give no counter-factual information and are easily misinterpreted. Below, we create a Shapley value explanation using the IML package. In that explanation, for instance the close-to-zero Shapley value for $parch$ gives the impression that it's a non-important feature, which is clearly wrong based on the CIU explanation.
```{r shapley_titanic}
library(iml)
predictor <- Predictor$new(model_rf, data = subset(titanic_train, select=-survived), y = titanic_train[,"survived"])
shapley <- Shapley$new(predictor, x.interest = new_passenger)
d <- shapley$results; d <- d[d$class=='yes',]; d$sign <- d$phi>=0
p <- ggplot(d) + geom_col(aes(x=reorder(feature.value, phi), y=phi, fill=sign)) +
coord_flip() +
labs(x ="", y = expression(phi)) + theme(legend.position = "none") +
scale_fill_manual("legend", values = c("FALSE" = "firebrick", "TRUE" = "steelblue"))
print(p)
```
It might be worth mentioning also that the Shapley value explanation has a much greater variance than the CIU (and Contextual influence) explanation with same number of samples. This is presumably due to the fundamental difference between estimating min/max output values for CIU, compared to estimating a kind of gradient with AFA methods.
# Intermediate Concepts
CIU can use named feature coalitions and structured vocabularies. Such vocabularies allow explanations at any abstraction level and can make explanations interactive.
The following code snippet plots the joint effect of features $age$ and $parch$ for the studied Titanic case (applicable for numeric features). It therefore shows how the coalition of those two features affects the output value and how CI and CU can be deduced in the same way as for a single feature.
```{r}
ciu$plot.ciu.3D(new_passenger, c(5,3), ind.output = 2, theta = 50, phi = 10,
col = "lightblue", ltheta = 120, shade = 0.75)
```
We define a small vocabulary for Titanic as follows:
```{r}
wealth<-c(1,6); family<-c(4,5); gender<-c(2); age<-c(3); embarked <- c(7)
Titanic.voc <- list("WEALTH"=wealth, "FAMILY"=family, "Gender"=gender,
"Age"=age, "Embarkment port"=embarked)
```
Then we create a new CIU object that uses that vocabulary and get top-level explanation.
```{r}
titanic_ciu <- ciu.new(model_rf, survived~., titanic_train, vocabulary = Titanic.voc)
meta.top <- titanic_ciu$meta.explain(new_passenger[,-8], concepts.to.explain=names(Titanic.voc), n.samples = 1000)
```
First barplot explanation:
```{r}
print(titanic_ciu$ggplot.col.ciu(new_passenger[,-8], output.names = "yes", ciu.meta = meta.top, plot.mode = "overlap"))
```
Then explain WEALTH and FAMILY
```{r}
print(titanic_ciu$ggplot.col.ciu(new_passenger[,-8], ind.inputs = Titanic.voc$FAMILY,
output.names = "yes", target.concept = "FAMILY",
target.ciu = meta.top$ciuvals[["FAMILY"]], n.samples = 100,
plot.mode = "overlap"))
print(titanic_ciu$ggplot.col.ciu(new_passenger[,-8],ind.inputs = Titanic.voc$WEALTH,
output.names = "yes", target.concept = "WEALTH",
target.ciu = meta.top$ciuvals[["WEALTH"]], n.samples = 100,
plot.mode = "overlap"))
```
Same thing using textual explanations:
```{r}
cat(titanic_ciu$textual(new_passenger[,-8], use.text.effects = TRUE, ind.output = 2,
ciu.meta = meta.top), "\n")
cat(titanic_ciu$textual(new_passenger[,-8], use.text.effects = TRUE, ind.output = 2,
ind.inputs = Titanic.voc$FAMILY, target.concept = "FAMILY",
target.ciu = meta.top$ciuvals[["FAMILY"]], n.samples = 100), "\n")
cat(titanic_ciu$textual(new_passenger[,-8], use.text.effects = TRUE, ind.output = 2,
ind.inputs = Titanic.voc$WEALTH, target.concept = "WEALTH",
target.ciu = meta.top$ciuvals[["WEALTH"]], n.samples = 100), "\n")
```
## Ames housing example
Ames housing is a data set about properties in the town Ames in the US. It contains over 80 features that can be used for learning to estimate the sales price. The following code imports the data set, does some pre-processing and trains a Gradient Boosting model:
```{r train_ames, message=FALSE, results='hide', warning=FALSE}
library(AmesHousing)
data("AmesHousing")
ames <- data.frame(make_ames())
# Split into train/test data sets
target <- 'Sale_Price'
trainIdx <- createDataPartition(ames[,target], p=0.8, list=FALSE)
trainData = ames[trainIdx,]
testData = ames[-trainIdx,]
# Train (this will take a while!)
kfoldcv <- trainControl(method="cv", number=10)
exec.time <- system.time(
Ames.gbm <<- train(Sale_Price~., trainData, method="gbm", trControl=kfoldcv))
# Training set performance (remember that prices are high, so RMSE will be high too!)
res <- predict(Ames.gbm, newdata=trainData)
cat(paste("Training set RMSE:", RMSE(trainData$Sale_Price, res), "\n"))
# Test set performance (remember that prices are high, so RMSE will be high too!)
res <- predict(Ames.gbm, newdata=testData)
cat(paste("Test set RMSE:", RMSE(testData$Sale_Price, res)))
```
We create our vocabulary (only two levels this time) and initialize CIU object:
```{r}
Ames.voc <- list(
"Garage"=c(58,59,60,61,62,63),
"Basement"=c(30,31,33,34,35,36,37,38,47,48),
"Lot"=c(3,4,7,8,9,10,11),
"Access"=c(13,14),
"House type"=c(1,15,16,21),
"House aesthetics"=c(22,23,24,25,26),
"House condition"=c(17,18,19,20,27,28),
"First floor surface"=c(43),
"Above ground living area"=which(names(ames)=="Gr_Liv_Area"))
Ames.voc_ciu <- ciu.new(Ames.gbm, Sale_Price~., trainData, vocabulary = Ames.voc)
```
We start with an "explanation" using all 80 basic features, which is not very readable and overly detailed for "ordinary" humans to understand:
```{r}
# We take an expensive house
inst.ind <- which(testData$Sale_Price>500000)[1]
instance <- subset(testData[inst.ind,], select=-Sale_Price)
# Explain
Ames_ciu.meta <- Ames.voc_ciu$meta.explain(instance)
print(Ames.voc_ciu$ggplot.col.ciu(instance, ciu.meta=Ames_ciu.meta, plot.mode = "overlap") +
labs(title="", x ="", y="CI", fill="CU"))
```
Then the same, using highest-level concepts:
```{r}
meta.top <- Ames.voc_ciu$meta.explain(instance, concepts.to.explain=names(Ames.voc),
n.samples = 1000)
print(Ames.voc_ciu$ggplot.col.ciu(instance, concepts.to.explain=names(Ames.voc),
plot.mode = "overlap"))
```
Then explain further some intermediate concepts:
```{r ames_subeplanations}
# House condition
print(Ames.voc_ciu$ggplot.col.ciu(instance, ind.inputs = Ames.voc$`House condition`,
target.concept = "House condition", plot.mode = "overlap"))
# Basement
print(Ames.voc_ciu$ggplot.col.ciu(instance, ind.inputs = Ames.voc$Basement,
target.concept = "Basement", plot.mode = "overlap"))
# Garage
print(Ames.voc_ciu$ggplot.col.ciu(instance, ind.inputs = Ames.voc$Garage,
target.concept = "Garage", plot.mode = "overlap"))
```
This vocabulary is just an example of what kind of concepts a human typically deals with. Vocabularies can be built freely (or learned, if possible) and used freely, even so that different vocabularies can be used with different users.
# Installation
`ciu` is available from CRAN at <https://cran.r-project.org/web/packages/ciu/index.html> and can be installed using the standard `install.packages("ciu")` command.
However, in order to use the latest developments, `ciu` can be installed directly from Github with the commands
``` r
# install.packages('devtools') # Uncomment if devtools wasn't installed already
devtools::install_github('KaryFramling/ciu')
```
**Remark**: If you get an error about inconsistency in Help file database, then restart R. This seems to happen if first removing `ciu` and then doing the install straight after. Apparently the un-install temporarily messes up the help database and this is not specific to `ciu`.
# Miscellaneous
The examples and code snippets shown here only show the main capabilities of CIU and this implementation. Here's a short list of miscellaneous other features of "good to know" information.
## Various examples and test cases
The file [`TestCases.R`](TestCases.R) contains functions for running and testing CIU with numerous functions and data sets (Iris, Boston, Heart Disease, UCI Cars, Diamonds, Titanic, Adult, Ames housing, etc.) and machine learning models (lda, Random Forest, GBM).
## Support for different AI/ML models
For the moment, any `caret` model should work (and most of them have been tested). Any `mlr3`model should also work but only `classif.rpart`has been tested. The `MASS:lda` and `stats::lm` models are also supported. Other models might work directly also, the default behaviour is to call `pred <- predict(model,inputs)` and retrieve result from `pred$posterior`.
For any other models, the prediction function to be use can be given as value of the `predict.function` argument of `ciu.new` method. For instance, for `mlr3` a function `predf.ciu <- function(m, newdata) { m$predict_newdata(newdata)$prob }` is sufficient.
Additionally, any object that implements the `CIU.BlackBox` interface with an `eval` function is automatically supported. The template for creating such a class looks as follows and examples are provided in the package's documentation (do `?ciu.blackbox.new`).
``` r
ciu.blackbox.new <- function() {
m <- list(eval = function(inputs) { NULL })
class(m) <- c("CIU.BlackBox", class(m))
return(m)
}
```
## Using old R "plot" functions
Some plotting functions are implemented that use R's plain graphics:
```{r}
ciu$barplot.ciu(new_passenger, ind.output = 2, sort = "CI")
```
There is also a method called pie.ciu that visualizes CIU using a pie chart, produced just by replacing `ciu@barplot.ciu` with `ciu@pie.ciu`.
```{r}
ciu$pie.ciu(new_passenger, ind.output = 2)
```
However, these functions can be considered deprecated and will presumably not be developed further.
## About programming style
This package and notably the CIU object implemented in ContextualImportanceUtility.R is programmed in an object-oriented manner that dates back to before 2010. It essentially uses the same object orientation principles as R6 classes and is used in quite a similar way but may look confusing to people who are not used to it. The principle is that the method `ciu.new()` first defines all the private instance variables and methods. Then, only those methods that are included in the [list] returned at the end are "public" and can be called using the `$` sign, *i.e.* in the same way as any list member.
This signifies that all CIU objects have their own environment, which consumes more memory than it would do to only store the necessary data. Calling the method as.ciu() returns an object of class ciu that is actually a list with all the data of the underlying CIU object. Classical functions exist for all \`CIU\` methods that take a `ciu` object as their first parameter. The reasons for this are mainly:
1. Adding new functionality in a function-per-function manner, without having to touch and extend the core implementation in the CIU class.
2. Reduced memory consumption.
All examples in this README are programmed using the `CIU` class and that is the recommended way to do if there's no particular reason to use `ciu` instead. However, the file "TestCases_NoObject.R" implements the same functionality as "TestCases.R" but using a `ciu` approach.
# Related resources
A Python version that (attempts to) provide similar functionality as this R package is available at <https://github.com/KaryFramling/py-ciu>. An older version is available at <https://github.com/TimKam/py-ciu> but it is not maintained nor updated anymore.
There are also two implementations of CIU for explaining images:
- R: <https://github.com/KaryFramling/ciu.image>
- Python: <https://github.com/KaryFramling/py.ciu.image>
Image explanation packages can be considered to be at proof-of-concept level (Nov. 2022). Future work on image explanation will presumably focus on the Python version, due to the extensive use of deep neural networks that tend to be implemented mainly for Python.
# References
The use of this package is described in *FRÄMLING, Kary. Contextual Importance and Utility in R: the 'ciu' Package. In: Proceedings of 1st Workshop on Explainable Agency in Artificial Intelligence, at 35th AAAI Conference on Artificial Intelligence. Virtual, Online. February 8-9, 2021. pp. 110-114.*, accessible at <http://www.cs.hut.fi/~framling/Publications/CIU_XAI_WS_AAAI2021.pdf>.
The first publication on CIU was in the ICANN conference in Paris in 1995: *FRÄMLING, Kary, GRAILLOT, Didier. Extracting Explanations from Neural Networks. ICANN'95 proceedings, Vol. 1, Paris, France, 9-13 October, 1995. Paris: EC2 & Cie, 1995. pp. 163-168.*, accessible at [http://www.cs.hut.fi/u/framling/Publications/FramlingIcann95.pdf](http://www.cs.hut.fi/~framling/Publications/CIU_XAI_WS_AAAI2021.pdf).
The second publication, and last before "hibernation" of CIU research, is *FRÄMLING, Kary. Explaining Results of Neural Networks by Contextual Importance and Utility. Proceedings of the AISB'96 conference, 1-2 April 1996. Brighton, UK, 1996.*, accessible at [http://www.cs.hut.fi/u/framling/Publications/FramlingAisb96.pdf](http://www.cs.hut.fi/~framling/Publications/CIU_XAI_WS_AAAI2021.pdf).
# Author
[Kary Främling](http://github.com/KaryFramling)