Clarify weighting scheme documentation #126

abenefield1 · 2024-03-19T19:36:44Z

@vpnagraj , the way we currently have the plane_score weighting documented is a little unclear. It doesn't matter too much, but I think we can improve it. There's no indication in the documentation about what scale the weights need to be on. The way it's coded, it doesn't actually matter as long as the weights are >1, but it might save someone some time if we make that a little clearer. For example, I thought that the weighting scheme was on a scale of 0:1, which produces different results.

Also, it seems like someone could weigh something >>> 1. Which is fine I guess, but we might want to set bounds.

With the weights as they are currently, it's not super intuitive when we talk about "reducing the weights of some components," because the only way to effectively do that is to raise the weights of others.

abenefield1 · 2024-04-22T19:30:52Z

@vpnagraj, waiting on answer to the question about if we use n_flags_weighted or weights_denominator in plane_score output to both clarify the documentation and/or add a constraint (depending on answer, weights either >0 or >1) inside of the plane_score() function. In the meantime, here's the reprex that shows the outputs. It should help up decide what we want to do :)

# Reprex of plane_score weighting confusion.

######################################################
######################################################
# load rplanes and the original plane_score:
######################################################
######################################################
library(rplanes)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union




######################################################
######################################################
# New plane_score:
######################################################
######################################################
# Proposed new plane_score that generates an error message for weights <= 0. May want to change this depending on answer about usin n_flags_weighted and weights_denominator
plane_score2 <- function(input, seed, components = "all", args = NULL, weights = NULL) {
  
  ## TODO: create this list as a built-in object?
  ## NOTE: consider using getFromNamespace to simplify this step
  complist <-
    list(cover = list(.function = plane_cover),
         diff = list(.function = plane_diff),
         taper = list(.function = plane_taper),
         `repeat` = list(.function = plane_repeat),
         trend = list(.function = plane_trend),
         shape = list(.function = plane_shape),
         zero = list(.function = plane_zero)
    )
  
  ## verify components for signal type ... some won't apply to observed
  allowed_observed <- c("repeat","diff","zero")
  
  ## handle condition when "all" components are requested
  ## observed data will only have a subset (the allowed compoments above)
  if(length(components) == 1 && components == "all") {
    if(is_observed(input)) {
      components <- allowed_observed
    } else {
      components <- names(complist)
    }
  } else {
    ## now handle case when components have been defined as a character vector (not as "all")
    if(is_observed(input)) {
      if(any(!components %in% allowed_observed)) {
        warning(paste0("Only the following components are allowed for observed signals: ", paste0(allowed_observed, collapse = ";")))
      }
      components <- components[components %in% allowed_observed]
      ## check if components vector is empty and if so stop
      if(length(components) == 0) {
        stop("The signal is observed but none of the allowed components for observed signals were specified.")
      }
    }
  }
  
  ## get all possible locations
  ## TODO: make this easier to access in prepped signal by adding locations element
  locs <- unique(input$data$location)
  
  ## create combinations of locations and components for mapping below
  to_map <- tidyr::crossing(locs = locs, comps = components)
  
  ## TODO: better way do this mapping and tracking of location / components by name
  ## NOTE: the !!!args[[.x]] will look for the name of the component in the list of args ...
  ## ... if it is not there (e.g., args = NULL) then the function will proceed with no additional args ...
  ## ... if it is there then the !!! will splice the named arguments passed in a list and apply them
  retl <-
    purrr::map2(to_map$locs, to_map$comps, ~ purrr::exec(complist[[.y]]$.function, location = .x, input = input, seed = seed, !!!args[[.y]])) %>%
    purrr::set_names(paste0(to_map$locs, "-", to_map$comps))
  
  ## grab full list of component results
  full_results <- purrr::map(retl, purrr::pluck)
  
  ## separator regex
  ## this should create something like "-(?=[diff|trend|taper])"
  ## when applied below that will split *only on hyphens that are followed by diff or trend or taper
  sep_rx <- paste0("-(?=[", paste0(components,collapse = "|"), "])")
  
  ## pull out summary tibble components and locations from the returned list above
  loc_tbl <-
    dplyr::tibble(loc_component = names(retl), indicator = purrr::map_lgl(retl, "indicator")) %>%
    tidyr::separate(.data$loc_component, into = c("location", "component"), sep = sep_rx)
  
  which_flags <-
    loc_tbl %>%
    dplyr::group_by(.data$location) %>%
    dplyr::filter(.data$indicator) %>%
    dplyr::summarise(flagged = paste0(.data$component, collapse = ";"))
  
  ## construct a tibble with weights for components
  ## if the weights argument is NULL then apply equal weights to all components
  if(is.null(weights)) {
    weights_tbl <- dplyr::tibble(component = components, weight = 1)
  } else {
    if(any(weights <= 0)) {
      stop("Weights must be a positive number")
    }
    
    if(!all(sort(names(weights)) == sort(components))) {
      stop("Weights must be provided as a vector with all components used included by name (e.g., c('diff' = 4, 'cover' = 1))")
    }
    
    weights_tbl <- dplyr::tibble(component = names(weights), weight = weights)
  }
  
  ## convert the tibble into a list
  loc_list <-
    loc_tbl %>%
    ## join to weights tbl defined above
    dplyr::left_join(weights_tbl, by = "component") %>%
    ## count number of flags (numerator for score)
    ## count number of components (denominator for score)
    ## convert to score
    ## hold onto the names of the components used
    dplyr::group_by(.data$location) %>%
    dplyr::summarise(n_flags = sum(.data$indicator),
                     n_components = dplyr::n(),
                     n_flags_weighted = sum(.data$indicator * .data$weight),
                     weights_denominator = sum(.data$weight),
                     score = .data$n_flags_weighted / .data$weights_denominator,
                     components = paste0(.data$component, collapse = ";")
    ) %>%
    ## join back to tibble that enumerates which components were flagged
    dplyr::left_join(which_flags, by = "location") %>%
    ## split into a list by location
    dplyr::group_split(.data$location, .keep = TRUE) %>%
    ## use the location column from the tibble in each split list element as name
    purrr::set_names(purrr::map_chr(., "location")) %>%
    ## make sure it is a list of lists (not a list of tibbles)
    purrr::map(., as.list)
  
  return(list(scores_summary = loc_list, scores_raw = loc_tbl, full_results = full_results))
  
}




######################################################
######################################################
# Read in data to test
######################################################
######################################################
## read in example observed data and prep observed signal
hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) %>%
  filter(location == "09")

hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")
prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")

## read in example forecast and prep forecast signal
fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")
prepped_forecast <- read_forecast(fp) %>%
  filter(location == "09")%>%
  to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)

## prepare seed with cut date
prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")

## run plane scoring with all components
plane_score(input = prepped_forecast, seed = prepped_seed)
#> $scores_summary
#> $scores_summary$`09`
#> $scores_summary$`09`$location
#> [1] "09"
#> 
#> $scores_summary$`09`$n_flags
#> [1] 4
#> 
#> $scores_summary$`09`$n_components
#> [1] 7
#> 
#> $scores_summary$`09`$n_flags_weighted
#> [1] 4
#> 
#> $scores_summary$`09`$weights_denominator
#> [1] 7
#> 
#> $scores_summary$`09`$score
#> [1] 0.5714286
#> 
#> $scores_summary$`09`$components
#> [1] "cover;diff;repeat;shape;taper;trend;zero"
#> 
#> $scores_summary$`09`$flagged
#> [1] "cover;diff;repeat;shape"
#> 
#> 
#> 
#> $scores_raw
#> # A tibble: 7 × 3
#>   location component indicator
#>   <chr>    <chr>     <lgl>    
#> 1 09       cover     TRUE     
#> 2 09       diff      TRUE     
#> 3 09       repeat    TRUE     
#> 4 09       shape     TRUE     
#> 5 09       taper     FALSE    
#> 6 09       trend     FALSE    
#> 7 09       zero      FALSE    
#> 
#> $full_results
#> $full_results$`09-cover`
#> $full_results$`09-cover`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-cover`$last_value
#> [1] 37
#> 
#> $full_results$`09-cover`$bounds
#> $full_results$`09-cover`$bounds$lower
#> [1] 57
#> 
#> $full_results$`09-cover`$bounds$upper
#> [1] 91
#> 
#> 
#> 
#> $full_results$`09-diff`
#> $full_results$`09-diff`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-diff`$values
#> [1] 37 74 75 75 75
#> 
#> $full_results$`09-diff`$evaluated_differences
#> [1] 37  1  0  0
#> 
#> $full_results$`09-diff`$maximum_difference
#> [1] 29
#> 
#> 
#> $full_results$`09-repeat`
#> $full_results$`09-repeat`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-repeat`$repeats
#> # A tibble: 3 × 6
#>   point location date       horizon lower upper
#>   <dbl> <chr>    <date>     <chr>   <dbl> <dbl>
#> 1    75 09       2022-11-12 2          50   100
#> 2    75 09       2022-11-19 3          43   106
#> 3    75 09       2022-11-26 4          38   112
#> 
#> 
#> $full_results$`09-shape`
#> $full_results$`09-shape`$indicator
#> [1] TRUE
#> 
#> 
#> $full_results$`09-taper`
#> $full_results$`09-taper`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-taper`$widths
#> [1] 34 50 63 74
#> 
#> 
#> $full_results$`09-trend`
#> $full_results$`09-trend`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-trend`$output
#> # A tibble: 20 × 7
#>    Location Index Date       Value Type     Changepoint Flagged
#>    <chr>    <int> <date>     <dbl> <chr>    <lgl>       <lgl>  
#>  1 09           1 2022-07-16     3 Observed FALSE       FALSE  
#>  2 09           2 2022-07-23     6 Observed FALSE       FALSE  
#>  3 09           3 2022-07-30     4 Observed FALSE       FALSE  
#>  4 09           4 2022-08-06     3 Observed FALSE       FALSE  
#>  5 09           5 2022-08-13     0 Observed FALSE       FALSE  
#>  6 09           6 2022-08-20     2 Observed FALSE       FALSE  
#>  7 09           7 2022-08-27     9 Observed FALSE       FALSE  
#>  8 09           8 2022-09-03     4 Observed FALSE       FALSE  
#>  9 09           9 2022-09-10     3 Observed FALSE       FALSE  
#> 10 09          10 2022-09-17     4 Observed FALSE       FALSE  
#> 11 09          11 2022-09-24     3 Observed FALSE       FALSE  
#> 12 09          12 2022-10-01     2 Observed FALSE       FALSE  
#> 13 09          13 2022-10-08     3 Observed FALSE       FALSE  
#> 14 09          14 2022-10-15     9 Observed FALSE       FALSE  
#> 15 09          15 2022-10-22     9 Observed TRUE        FALSE  
#> 16 09          16 2022-10-29    37 Observed FALSE       FALSE  
#> 17 09          17 2022-11-05    74 Forecast FALSE       FALSE  
#> 18 09          18 2022-11-12    75 Forecast FALSE       FALSE  
#> 19 09          19 2022-11-19    75 Forecast FALSE       FALSE  
#> 20 09          20 2022-11-26    75 Forecast FALSE       FALSE  
#> 
#> $full_results$`09-trend`$flagged_dates
#> [1] NA
#> 
#> 
#> $full_results$`09-zero`
#> $full_results$`09-zero`$indicator
#> [1] FALSE

## run plane scoring with select components
plane_score(input = prepped_forecast, seed = prepped_seed, components = c("cover","taper"))
#> $scores_summary
#> $scores_summary$`09`
#> $scores_summary$`09`$location
#> [1] "09"
#> 
#> $scores_summary$`09`$n_flags
#> [1] 1
#> 
#> $scores_summary$`09`$n_components
#> [1] 2
#> 
#> $scores_summary$`09`$n_flags_weighted
#> [1] 1
#> 
#> $scores_summary$`09`$weights_denominator
#> [1] 2
#> 
#> $scores_summary$`09`$score
#> [1] 0.5
#> 
#> $scores_summary$`09`$components
#> [1] "cover;taper"
#> 
#> $scores_summary$`09`$flagged
#> [1] "cover"
#> 
#> 
#> 
#> $scores_raw
#> # A tibble: 2 × 3
#>   location component indicator
#>   <chr>    <chr>     <lgl>    
#> 1 09       cover     TRUE     
#> 2 09       taper     FALSE    
#> 
#> $full_results
#> $full_results$`09-cover`
#> $full_results$`09-cover`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-cover`$last_value
#> [1] 37
#> 
#> $full_results$`09-cover`$bounds
#> $full_results$`09-cover`$bounds$lower
#> [1] 57
#> 
#> $full_results$`09-cover`$bounds$upper
#> [1] 91
#> 
#> 
#> 
#> $full_results$`09-taper`
#> $full_results$`09-taper`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-taper`$widths
#> [1] 34 50 63 74

## run plane scoring with all components and additional args
comp_args <- list("trend" = list("sig_lvl" = 0.05), "repeat" = list("prepend" = 4, "tolerance" = 8))
plane_score(input = prepped_forecast, seed = prepped_seed, args = comp_args)
#> $scores_summary
#> $scores_summary$`09`
#> $scores_summary$`09`$location
#> [1] "09"
#> 
#> $scores_summary$`09`$n_flags
#> [1] 3
#> 
#> $scores_summary$`09`$n_components
#> [1] 7
#> 
#> $scores_summary$`09`$n_flags_weighted
#> [1] 3
#> 
#> $scores_summary$`09`$weights_denominator
#> [1] 7
#> 
#> $scores_summary$`09`$score
#> [1] 0.4285714
#> 
#> $scores_summary$`09`$components
#> [1] "cover;diff;repeat;shape;taper;trend;zero"
#> 
#> $scores_summary$`09`$flagged
#> [1] "cover;diff;shape"
#> 
#> 
#> 
#> $scores_raw
#> # A tibble: 7 × 3
#>   location component indicator
#>   <chr>    <chr>     <lgl>    
#> 1 09       cover     TRUE     
#> 2 09       diff      TRUE     
#> 3 09       repeat    FALSE    
#> 4 09       shape     TRUE     
#> 5 09       taper     FALSE    
#> 6 09       trend     FALSE    
#> 7 09       zero      FALSE    
#> 
#> $full_results
#> $full_results$`09-cover`
#> $full_results$`09-cover`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-cover`$last_value
#> [1] 37
#> 
#> $full_results$`09-cover`$bounds
#> $full_results$`09-cover`$bounds$lower
#> [1] 57
#> 
#> $full_results$`09-cover`$bounds$upper
#> [1] 91
#> 
#> 
#> 
#> $full_results$`09-diff`
#> $full_results$`09-diff`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-diff`$values
#> [1] 37 74 75 75 75
#> 
#> $full_results$`09-diff`$evaluated_differences
#> [1] 37  1  0  0
#> 
#> $full_results$`09-diff`$maximum_difference
#> [1] 29
#> 
#> 
#> $full_results$`09-repeat`
#> $full_results$`09-repeat`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-repeat`$repeats
#> # A tibble: 0 × 6
#> # ℹ 6 variables: point <dbl>, location <chr>, date <date>, horizon <chr>,
#> #   lower <dbl>, upper <dbl>
#> 
#> 
#> $full_results$`09-shape`
#> $full_results$`09-shape`$indicator
#> [1] TRUE
#> 
#> 
#> $full_results$`09-taper`
#> $full_results$`09-taper`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-taper`$widths
#> [1] 34 50 63 74
#> 
#> 
#> $full_results$`09-trend`
#> $full_results$`09-trend`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-trend`$output
#> # A tibble: 20 × 7
#>    Location Index Date       Value Type     Changepoint Flagged
#>    <chr>    <int> <date>     <dbl> <chr>    <lgl>       <lgl>  
#>  1 09           1 2022-07-16     3 Observed FALSE       FALSE  
#>  2 09           2 2022-07-23     6 Observed FALSE       FALSE  
#>  3 09           3 2022-07-30     4 Observed FALSE       FALSE  
#>  4 09           4 2022-08-06     3 Observed FALSE       FALSE  
#>  5 09           5 2022-08-13     0 Observed FALSE       FALSE  
#>  6 09           6 2022-08-20     2 Observed FALSE       FALSE  
#>  7 09           7 2022-08-27     9 Observed FALSE       FALSE  
#>  8 09           8 2022-09-03     4 Observed FALSE       FALSE  
#>  9 09           9 2022-09-10     3 Observed FALSE       FALSE  
#> 10 09          10 2022-09-17     4 Observed FALSE       FALSE  
#> 11 09          11 2022-09-24     3 Observed FALSE       FALSE  
#> 12 09          12 2022-10-01     2 Observed FALSE       FALSE  
#> 13 09          13 2022-10-08     3 Observed FALSE       FALSE  
#> 14 09          14 2022-10-15     9 Observed FALSE       FALSE  
#> 15 09          15 2022-10-22     9 Observed TRUE        FALSE  
#> 16 09          16 2022-10-29    37 Observed FALSE       FALSE  
#> 17 09          17 2022-11-05    74 Forecast FALSE       FALSE  
#> 18 09          18 2022-11-12    75 Forecast FALSE       FALSE  
#> 19 09          19 2022-11-19    75 Forecast FALSE       FALSE  
#> 20 09          20 2022-11-26    75 Forecast FALSE       FALSE  
#> 
#> $full_results$`09-trend`$flagged_dates
#> [1] NA
#> 
#> 
#> $full_results$`09-zero`
#> $full_results$`09-zero`$indicator
#> [1] FALSE




######################################################
######################################################
# Run tests on old and new plane_score for a variety of possible types of weights:
######################################################
######################################################


######################################################
# Test with integers
######################################################
comps <- c("cover", "taper", "diff")
wts <- c("cover" = 2, "taper" = 1, "diff" = 4)

# With plane_score as is:
integer_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts)[["scores_summary"]])

# With new plane_score with error message for weights <= 0
integer_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts)[["scores_summary"]])

identical(integer_original, integer_new)
#> [1] TRUE
# This works as is, and there is no change with the new proposed plane_score




######################################################
# Test proportions that sum to 1
######################################################
wts2 <- wts/sum(wts)
sum(wts2)
#> [1] 1

# With plane_score as is:
props_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts2)[["scores_summary"]])

# With new plane_score with error message for weights <= 0
props_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts2)[["scores_summary"]])

identical(props_original, props_new)
#> [1] TRUE
# This works as is, and there is no change with the new proposed plane_score IFF we never use n_flags_weighted or weights_denominator output




######################################################
# Test standardized weights
######################################################
wts3 <- wts/mean(wts)
sum(wts3)
#> [1] 3

# With plane_score as is:
mean_scaled_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts3)[["scores_summary"]])

# With new plane_score with error message for weights <= 0
mean_scaled_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts3)[["scores_summary"]])

identical(mean_scaled_original, mean_scaled_new)
#> [1] TRUE
# This works as is, and there is no change with the new proposed plane_score IFF we never use n_flags_weighted or weights_denominator output




######################################################
# Test weights standardized around zero
######################################################
wts4 <- wts - mean(wts)
wts4
#>      cover      taper       diff 
#> -0.3333333 -1.3333333  1.6666667
round(mean(wts4), 1)
#> [1] 0
round(sum(wts4), 1)
#> [1] 0

# With plane_score as is:
centered_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts4)[["scores_summary"]])

# With new plane_score with error message for weights <= 0
centered_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts4)[["scores_summary"]])
#> Error in plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a positive number

# This did not work as is. Produced unexpected results. Might be fine if comparing apples to apples, but it makes the resulting score confusing. With the edited plane_score, it doesn't run and produces an error message.


results <- data.frame(integer_original, integer_new, props_original, props_new, mean_scaled_original, mean_scaled_new, centered_original, centered_new = "error message")
results
#>                         integer_original       integer_new    props_original
#> 09.location                           09                09                09
#> 09.n_flags                             2                 2                 2
#> 09.n_components                        3                 3                 3
#> 09.n_flags_weighted                    6                 6 0.857142857142857
#> 09.weights_denominator                 7                 7                 1
#> 09.score               0.857142857142857 0.857142857142857 0.857142857142857
#> 09.components           cover;diff;taper  cover;diff;taper  cover;diff;taper
#> 09.flagged                    cover;diff        cover;diff        cover;diff
#>                                props_new mean_scaled_original   mean_scaled_new
#> 09.location                           09                   09                09
#> 09.n_flags                             2                    2                 2
#> 09.n_components                        3                    3                 3
#> 09.n_flags_weighted    0.857142857142857     2.57142857142857  2.57142857142857
#> 09.weights_denominator                 1                    3                 3
#> 09.score               0.857142857142857    0.857142857142857 0.857142857142857
#> 09.components           cover;diff;taper     cover;diff;taper  cover;diff;taper
#> 09.flagged                    cover;diff           cover;diff        cover;diff
#>                            centered_original  centered_new
#> 09.location                               09 error message
#> 09.n_flags                                 2 error message
#> 09.n_components                            3 error message
#> 09.n_flags_weighted         1.33333333333333 error message
#> 09.weights_denominator -4.44089209850063e-16 error message
#> 09.score                   -3002399751580330 error message
#> 09.components               cover;diff;taper error message
#> 09.flagged                        cover;diff error message

^{Created on 2024-04-22 with reprex v2.0.2}

vpnagraj · 2024-04-22T20:47:56Z

thanks @abenefield1 !

noting our follow up discussion for posterity ... yes, lets go with a constraint for weights >= 1

when youre done with that open a PR from the feature branch for this change to the v0.0.3 branch and i will review

abenefield1 · 2024-04-22T21:21:16Z

@vpnagraj

Changed plane_score so weights must be >= 1. Also updated documentation and initiated a PR (#134).

Checked and there was no issue with integers vs reals

# Reprex of plane_score with updates
# plane_score has been changed so that weights must be >= 1
library(rplanes)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union




######################################################
######################################################
# Read in data to test
######################################################
######################################################
## read in example observed data and prep observed signal
hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) %>%
  filter(location == "09")

hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")
prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")

## read in example forecast and prep forecast signal
fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")
prepped_forecast <- read_forecast(fp) %>%
  filter(location == "09")%>%
  to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)

## prepare seed with cut date
prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")


######################################################
######################################################
# Run tests on new plane_score for a variety of possible 
# types of numbers specified for weights:
######################################################
######################################################


######################################################
# Test with integers
######################################################
comps <- c("cover", "taper", "diff")
int_wts <- c("cover" = 1, "taper" = 2, "diff" = 4)

integer <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = int_wts)[["scores_summary"]])
# This should work with new plane_score, and it does.




######################################################
# Test with reals
######################################################
real_wts <- int_wts*1.5
real_wts
#> cover taper  diff 
#>   1.5   3.0   6.0

real <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = real_wts)[["scores_summary"]])
# This should work with new plane_score, and it does.




######################################################
# Test proportions that sum to 1
######################################################
prop_wts <- int_wts/sum(int_wts)
prop_wts
#>     cover     taper      diff 
#> 0.1428571 0.2857143 0.5714286
sum(prop_wts)
#> [1] 1

props <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = prop_wts)[["scores_summary"]])
#> Error in plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a real number >= 1
# Correctly breaks with new plane_score(), because weights < 1




######################################################
# Test standardized (mean scaled) weights
######################################################
mean_scaled_wts <- int_wts/mean(int_wts)
sum(mean_scaled_wts)
#> [1] 3
mean_scaled_wts
#>     cover     taper      diff 
#> 0.4285714 0.8571429 1.7142857

mean_scaled <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = mean_scaled_wts)[["scores_summary"]])
#> Error in plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a real number >= 1
# Correctly breaks with new plane_score(), because some weights < 1




######################################################
# Test weights centered around zero
######################################################
zero_centered_wts <- int_wts - mean(int_wts)
zero_centered_wts
#>      cover      taper       diff 
#> -1.3333333 -0.3333333  1.6666667
round(mean(zero_centered_wts), 1)
#> [1] 0
round(sum(zero_centered_wts), 1)
#> [1] 0

# With plane_score as is:
zero_centered <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = zero_centered_wts)[["scores_summary"]])
#> Error in plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a real number >= 1
# Correctly breaks with new plane_score(), because some weights < 1




results <- data.frame(integer, real, props = "error",  mean_scaled = "error", zero_centered = "error")
results
#>                                  integer              real props mean_scaled
#> 09.location                           09                09 error       error
#> 09.n_flags                             2                 2 error       error
#> 09.n_components                        3                 3 error       error
#> 09.n_flags_weighted                    5               7.5 error       error
#> 09.weights_denominator                 7              10.5 error       error
#> 09.score               0.714285714285714 0.714285714285714 error       error
#> 09.components           cover;diff;taper  cover;diff;taper error       error
#> 09.flagged                    cover;diff        cover;diff error       error
#>                        zero_centered
#> 09.location                    error
#> 09.n_flags                     error
#> 09.n_components                error
#> 09.n_flags_weighted            error
#> 09.weights_denominator         error
#> 09.score                       error
#> 09.components                  error
#> 09.flagged                     error

^{Created on 2024-04-22 with reprex v2.0.2}

vpnagraj · 2024-04-23T20:49:06Z

done and on the v0.0.3 branch (https://github.com/signaturescience/rplanes/tree/v0.0.3). will be merged to main with the v0.0.3 release.

vpnagraj mentioned this issue Apr 9, 2024

v0.0.3 release #128

Closed

9 tasks

abenefield1 linked a pull request Apr 22, 2024 that will close this issue

Edited plane_score so that weights must be >= 1. Also edited documentation #134

Closed

vpnagraj closed this as completed Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify weighting scheme documentation #126

Clarify weighting scheme documentation #126

abenefield1 commented Mar 19, 2024 •

edited

Loading

abenefield1 commented Apr 22, 2024

vpnagraj commented Apr 22, 2024

abenefield1 commented Apr 22, 2024

vpnagraj commented Apr 23, 2024

Clarify weighting scheme documentation #126

Clarify weighting scheme documentation #126

Comments

abenefield1 commented Mar 19, 2024 • edited Loading

abenefield1 commented Apr 22, 2024

vpnagraj commented Apr 22, 2024

abenefield1 commented Apr 22, 2024

vpnagraj commented Apr 23, 2024

abenefield1 commented Mar 19, 2024 •

edited

Loading