binary model evaluation #20

mattmills49 · 2017-02-10T19:41:08Z

not sure what the output should be

mattmills49 · 2017-03-14T16:25:17Z

This is a function I wrote for some work analysis:

model_acc <- function(.data, .model){
  preds <- predict(.model, newdata = .data, type = "response")
  
  pred_data <- data_frame(actual = .data$Attn, preds = preds, type = .data$train) %>%
    filter(!is.na(preds)) %>%
    group_by(type) %>%
    arrange(desc(preds)) %>%
    mutate(TPR = cumsum(actual) / sum(actual),
           FPR = cumsum(1 - actual) / sum(1 - actual)) %>%
    summarize(MSE = mean((preds - actual)^2),
              AUC = sum(diff(FPR) * na.omit(lead(TPR) + TPR)) / 2,
              TPR = mean(preds > .5 & actual == 1),
              TNR = mean(preds <= .5 & actual == 0),
              LSR = mean(actual * log(preds) + (1 - actual) * log(1 - preds)))
  
  cal <- data_frame(actual = .data$Attn, preds = preds, type = .data$train) %>%
    filter(!is.na(preds)) %>%
    group_by(type) %>%
    mutate(pred_group = cut(preds, breaks = floor(n() / 1000), include.lowest = T)) %>%
    group_by(type, pred_group) %>%
    summarize(mean_pred = mean(preds), mean_actual = mean(actual)) %>%
    summarize(Bias = mean(mean_pred - mean_actual))
  
  return(left_join(pred_data, cal, by = "type"))

Positives:

easily calculates model scores including MSE, AUC, True Positive Rate, True Negative Rate, and the Logistic Scoring Rule
Returns results in a data frame

Negatives:

Doesn't generalize to new data for prediction column, dependent column, and any groupings
Takes in both the model and data frame, which I'm not sure is necessary.
Calculates all the metrics. What if you only want one? What if you want to add one?

mattmills49 · 2017-03-14T16:29:00Z

To generalize we could probably use formulas, for example the call

binary_model_evaluation <- function(.data = model_data, prediction_formula = dependent ~ prediction, group_var = "split")

Would tell us that in the .data dataframe the dependent column is our "Y" variable and the predictions from the model we are validating are in the prediction column. We could also include a grouping variable (group_var) that would allow us to calculate the accuracy measurements on each split of the data (think a train/test split or time-based variable like season).

mattmills49 · 2017-03-14T16:33:08Z

If we wanted to make the metrics portable we would have to write each as a separate function and then pass the different measures as list or named vector.

mattmills49 added a commit that referenced this issue Mar 14, 2017

wrote skeleton for binary model evaluation #20

b0ac7f6

mattmills49 added a commit that referenced this issue Mar 14, 2017

added a group_var argument #20

8f30db0

mattmills49 added a commit that referenced this issue Mar 14, 2017

removed old BinaryModelValidation code #20

a81fea3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

binary model evaluation #20

binary model evaluation #20

mattmills49 commented Feb 10, 2017

mattmills49 commented Mar 14, 2017

mattmills49 commented Mar 14, 2017 •

edited

Loading

mattmills49 commented Mar 14, 2017

binary model evaluation #20

binary model evaluation #20

Comments

mattmills49 commented Feb 10, 2017

mattmills49 commented Mar 14, 2017

mattmills49 commented Mar 14, 2017 • edited Loading

mattmills49 commented Mar 14, 2017

mattmills49 commented Mar 14, 2017 •

edited

Loading