Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify weighting scheme documentation #126

Closed
abenefield1 opened this issue Mar 19, 2024 · 4 comments
Closed

Clarify weighting scheme documentation #126

abenefield1 opened this issue Mar 19, 2024 · 4 comments

Comments

@abenefield1
Copy link
Collaborator

abenefield1 commented Mar 19, 2024

@vpnagraj , the way we currently have the plane_score weighting documented is a little unclear. It doesn't matter too much, but I think we can improve it. There's no indication in the documentation about what scale the weights need to be on. The way it's coded, it doesn't actually matter as long as the weights are >1, but it might save someone some time if we make that a little clearer. For example, I thought that the weighting scheme was on a scale of 0:1, which produces different results.

Also, it seems like someone could weigh something >>> 1. Which is fine I guess, but we might want to set bounds.

With the weights as they are currently, it's not super intuitive when we talk about "reducing the weights of some components," because the only way to effectively do that is to raise the weights of others.

@vpnagraj vpnagraj mentioned this issue Apr 9, 2024
9 tasks
@abenefield1
Copy link
Collaborator Author

@vpnagraj, waiting on answer to the question about if we use n_flags_weighted or weights_denominator in plane_score output to both clarify the documentation and/or add a constraint (depending on answer, weights either >0 or >1) inside of the plane_score() function. In the meantime, here's the reprex that shows the outputs. It should help up decide what we want to do :)

# Reprex of plane_score weighting confusion.

######################################################
######################################################
# load rplanes and the original plane_score:
######################################################
######################################################
library(rplanes)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union




######################################################
######################################################
# New plane_score:
######################################################
######################################################
# Proposed new plane_score that generates an error message for weights <= 0. May want to change this depending on answer about usin n_flags_weighted and weights_denominator
plane_score2 <- function(input, seed, components = "all", args = NULL, weights = NULL) {
  
  ## TODO: create this list as a built-in object?
  ## NOTE: consider using getFromNamespace to simplify this step
  complist <-
    list(cover = list(.function = plane_cover),
         diff = list(.function = plane_diff),
         taper = list(.function = plane_taper),
         `repeat` = list(.function = plane_repeat),
         trend = list(.function = plane_trend),
         shape = list(.function = plane_shape),
         zero = list(.function = plane_zero)
    )
  
  ## verify components for signal type ... some won't apply to observed
  allowed_observed <- c("repeat","diff","zero")
  
  ## handle condition when "all" components are requested
  ## observed data will only have a subset (the allowed compoments above)
  if(length(components) == 1 && components == "all") {
    if(is_observed(input)) {
      components <- allowed_observed
    } else {
      components <- names(complist)
    }
  } else {
    ## now handle case when components have been defined as a character vector (not as "all")
    if(is_observed(input)) {
      if(any(!components %in% allowed_observed)) {
        warning(paste0("Only the following components are allowed for observed signals: ", paste0(allowed_observed, collapse = ";")))
      }
      components <- components[components %in% allowed_observed]
      ## check if components vector is empty and if so stop
      if(length(components) == 0) {
        stop("The signal is observed but none of the allowed components for observed signals were specified.")
      }
    }
  }
  
  ## get all possible locations
  ## TODO: make this easier to access in prepped signal by adding locations element
  locs <- unique(input$data$location)
  
  ## create combinations of locations and components for mapping below
  to_map <- tidyr::crossing(locs = locs, comps = components)
  
  ## TODO: better way do this mapping and tracking of location / components by name
  ## NOTE: the !!!args[[.x]] will look for the name of the component in the list of args ...
  ## ... if it is not there (e.g., args = NULL) then the function will proceed with no additional args ...
  ## ... if it is there then the !!! will splice the named arguments passed in a list and apply them
  retl <-
    purrr::map2(to_map$locs, to_map$comps, ~ purrr::exec(complist[[.y]]$.function, location = .x, input = input, seed = seed, !!!args[[.y]])) %>%
    purrr::set_names(paste0(to_map$locs, "-", to_map$comps))
  
  ## grab full list of component results
  full_results <- purrr::map(retl, purrr::pluck)
  
  ## separator regex
  ## this should create something like "-(?=[diff|trend|taper])"
  ## when applied below that will split *only on hyphens that are followed by diff or trend or taper
  sep_rx <- paste0("-(?=[", paste0(components,collapse = "|"), "])")
  
  ## pull out summary tibble components and locations from the returned list above
  loc_tbl <-
    dplyr::tibble(loc_component = names(retl), indicator = purrr::map_lgl(retl, "indicator")) %>%
    tidyr::separate(.data$loc_component, into = c("location", "component"), sep = sep_rx)
  
  which_flags <-
    loc_tbl %>%
    dplyr::group_by(.data$location) %>%
    dplyr::filter(.data$indicator) %>%
    dplyr::summarise(flagged = paste0(.data$component, collapse = ";"))
  
  ## construct a tibble with weights for components
  ## if the weights argument is NULL then apply equal weights to all components
  if(is.null(weights)) {
    weights_tbl <- dplyr::tibble(component = components, weight = 1)
  } else {
    if(any(weights <= 0)) {
      stop("Weights must be a positive number")
    }
    
    if(!all(sort(names(weights)) == sort(components))) {
      stop("Weights must be provided as a vector with all components used included by name (e.g., c('diff' = 4, 'cover' = 1))")
    }
    
    weights_tbl <- dplyr::tibble(component = names(weights), weight = weights)
  }
  
  ## convert the tibble into a list
  loc_list <-
    loc_tbl %>%
    ## join to weights tbl defined above
    dplyr::left_join(weights_tbl, by = "component") %>%
    ## count number of flags (numerator for score)
    ## count number of components (denominator for score)
    ## convert to score
    ## hold onto the names of the components used
    dplyr::group_by(.data$location) %>%
    dplyr::summarise(n_flags = sum(.data$indicator),
                     n_components = dplyr::n(),
                     n_flags_weighted = sum(.data$indicator * .data$weight),
                     weights_denominator = sum(.data$weight),
                     score = .data$n_flags_weighted / .data$weights_denominator,
                     components = paste0(.data$component, collapse = ";")
    ) %>%
    ## join back to tibble that enumerates which components were flagged
    dplyr::left_join(which_flags, by = "location") %>%
    ## split into a list by location
    dplyr::group_split(.data$location, .keep = TRUE) %>%
    ## use the location column from the tibble in each split list element as name
    purrr::set_names(purrr::map_chr(., "location")) %>%
    ## make sure it is a list of lists (not a list of tibbles)
    purrr::map(., as.list)
  
  return(list(scores_summary = loc_list, scores_raw = loc_tbl, full_results = full_results))
  
}




######################################################
######################################################
# Read in data to test
######################################################
######################################################
## read in example observed data and prep observed signal
hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) %>%
  filter(location == "09")

hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")
prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")

## read in example forecast and prep forecast signal
fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")
prepped_forecast <- read_forecast(fp) %>%
  filter(location == "09")%>%
  to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)

## prepare seed with cut date
prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")

## run plane scoring with all components
plane_score(input = prepped_forecast, seed = prepped_seed)
#> $scores_summary
#> $scores_summary$`09`
#> $scores_summary$`09`$location
#> [1] "09"
#> 
#> $scores_summary$`09`$n_flags
#> [1] 4
#> 
#> $scores_summary$`09`$n_components
#> [1] 7
#> 
#> $scores_summary$`09`$n_flags_weighted
#> [1] 4
#> 
#> $scores_summary$`09`$weights_denominator
#> [1] 7
#> 
#> $scores_summary$`09`$score
#> [1] 0.5714286
#> 
#> $scores_summary$`09`$components
#> [1] "cover;diff;repeat;shape;taper;trend;zero"
#> 
#> $scores_summary$`09`$flagged
#> [1] "cover;diff;repeat;shape"
#> 
#> 
#> 
#> $scores_raw
#> # A tibble: 7 × 3
#>   location component indicator
#>   <chr>    <chr>     <lgl>    
#> 1 09       cover     TRUE     
#> 2 09       diff      TRUE     
#> 3 09       repeat    TRUE     
#> 4 09       shape     TRUE     
#> 5 09       taper     FALSE    
#> 6 09       trend     FALSE    
#> 7 09       zero      FALSE    
#> 
#> $full_results
#> $full_results$`09-cover`
#> $full_results$`09-cover`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-cover`$last_value
#> [1] 37
#> 
#> $full_results$`09-cover`$bounds
#> $full_results$`09-cover`$bounds$lower
#> [1] 57
#> 
#> $full_results$`09-cover`$bounds$upper
#> [1] 91
#> 
#> 
#> 
#> $full_results$`09-diff`
#> $full_results$`09-diff`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-diff`$values
#> [1] 37 74 75 75 75
#> 
#> $full_results$`09-diff`$evaluated_differences
#> [1] 37  1  0  0
#> 
#> $full_results$`09-diff`$maximum_difference
#> [1] 29
#> 
#> 
#> $full_results$`09-repeat`
#> $full_results$`09-repeat`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-repeat`$repeats
#> # A tibble: 3 × 6
#>   point location date       horizon lower upper
#>   <dbl> <chr>    <date>     <chr>   <dbl> <dbl>
#> 1    75 09       2022-11-12 2          50   100
#> 2    75 09       2022-11-19 3          43   106
#> 3    75 09       2022-11-26 4          38   112
#> 
#> 
#> $full_results$`09-shape`
#> $full_results$`09-shape`$indicator
#> [1] TRUE
#> 
#> 
#> $full_results$`09-taper`
#> $full_results$`09-taper`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-taper`$widths
#> [1] 34 50 63 74
#> 
#> 
#> $full_results$`09-trend`
#> $full_results$`09-trend`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-trend`$output
#> # A tibble: 20 × 7
#>    Location Index Date       Value Type     Changepoint Flagged
#>    <chr>    <int> <date>     <dbl> <chr>    <lgl>       <lgl>  
#>  1 09           1 2022-07-16     3 Observed FALSE       FALSE  
#>  2 09           2 2022-07-23     6 Observed FALSE       FALSE  
#>  3 09           3 2022-07-30     4 Observed FALSE       FALSE  
#>  4 09           4 2022-08-06     3 Observed FALSE       FALSE  
#>  5 09           5 2022-08-13     0 Observed FALSE       FALSE  
#>  6 09           6 2022-08-20     2 Observed FALSE       FALSE  
#>  7 09           7 2022-08-27     9 Observed FALSE       FALSE  
#>  8 09           8 2022-09-03     4 Observed FALSE       FALSE  
#>  9 09           9 2022-09-10     3 Observed FALSE       FALSE  
#> 10 09          10 2022-09-17     4 Observed FALSE       FALSE  
#> 11 09          11 2022-09-24     3 Observed FALSE       FALSE  
#> 12 09          12 2022-10-01     2 Observed FALSE       FALSE  
#> 13 09          13 2022-10-08     3 Observed FALSE       FALSE  
#> 14 09          14 2022-10-15     9 Observed FALSE       FALSE  
#> 15 09          15 2022-10-22     9 Observed TRUE        FALSE  
#> 16 09          16 2022-10-29    37 Observed FALSE       FALSE  
#> 17 09          17 2022-11-05    74 Forecast FALSE       FALSE  
#> 18 09          18 2022-11-12    75 Forecast FALSE       FALSE  
#> 19 09          19 2022-11-19    75 Forecast FALSE       FALSE  
#> 20 09          20 2022-11-26    75 Forecast FALSE       FALSE  
#> 
#> $full_results$`09-trend`$flagged_dates
#> [1] NA
#> 
#> 
#> $full_results$`09-zero`
#> $full_results$`09-zero`$indicator
#> [1] FALSE

## run plane scoring with select components
plane_score(input = prepped_forecast, seed = prepped_seed, components = c("cover","taper"))
#> $scores_summary
#> $scores_summary$`09`
#> $scores_summary$`09`$location
#> [1] "09"
#> 
#> $scores_summary$`09`$n_flags
#> [1] 1
#> 
#> $scores_summary$`09`$n_components
#> [1] 2
#> 
#> $scores_summary$`09`$n_flags_weighted
#> [1] 1
#> 
#> $scores_summary$`09`$weights_denominator
#> [1] 2
#> 
#> $scores_summary$`09`$score
#> [1] 0.5
#> 
#> $scores_summary$`09`$components
#> [1] "cover;taper"
#> 
#> $scores_summary$`09`$flagged
#> [1] "cover"
#> 
#> 
#> 
#> $scores_raw
#> # A tibble: 2 × 3
#>   location component indicator
#>   <chr>    <chr>     <lgl>    
#> 1 09       cover     TRUE     
#> 2 09       taper     FALSE    
#> 
#> $full_results
#> $full_results$`09-cover`
#> $full_results$`09-cover`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-cover`$last_value
#> [1] 37
#> 
#> $full_results$`09-cover`$bounds
#> $full_results$`09-cover`$bounds$lower
#> [1] 57
#> 
#> $full_results$`09-cover`$bounds$upper
#> [1] 91
#> 
#> 
#> 
#> $full_results$`09-taper`
#> $full_results$`09-taper`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-taper`$widths
#> [1] 34 50 63 74

## run plane scoring with all components and additional args
comp_args <- list("trend" = list("sig_lvl" = 0.05), "repeat" = list("prepend" = 4, "tolerance" = 8))
plane_score(input = prepped_forecast, seed = prepped_seed, args = comp_args)
#> $scores_summary
#> $scores_summary$`09`
#> $scores_summary$`09`$location
#> [1] "09"
#> 
#> $scores_summary$`09`$n_flags
#> [1] 3
#> 
#> $scores_summary$`09`$n_components
#> [1] 7
#> 
#> $scores_summary$`09`$n_flags_weighted
#> [1] 3
#> 
#> $scores_summary$`09`$weights_denominator
#> [1] 7
#> 
#> $scores_summary$`09`$score
#> [1] 0.4285714
#> 
#> $scores_summary$`09`$components
#> [1] "cover;diff;repeat;shape;taper;trend;zero"
#> 
#> $scores_summary$`09`$flagged
#> [1] "cover;diff;shape"
#> 
#> 
#> 
#> $scores_raw
#> # A tibble: 7 × 3
#>   location component indicator
#>   <chr>    <chr>     <lgl>    
#> 1 09       cover     TRUE     
#> 2 09       diff      TRUE     
#> 3 09       repeat    FALSE    
#> 4 09       shape     TRUE     
#> 5 09       taper     FALSE    
#> 6 09       trend     FALSE    
#> 7 09       zero      FALSE    
#> 
#> $full_results
#> $full_results$`09-cover`
#> $full_results$`09-cover`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-cover`$last_value
#> [1] 37
#> 
#> $full_results$`09-cover`$bounds
#> $full_results$`09-cover`$bounds$lower
#> [1] 57
#> 
#> $full_results$`09-cover`$bounds$upper
#> [1] 91
#> 
#> 
#> 
#> $full_results$`09-diff`
#> $full_results$`09-diff`$indicator
#> [1] TRUE
#> 
#> $full_results$`09-diff`$values
#> [1] 37 74 75 75 75
#> 
#> $full_results$`09-diff`$evaluated_differences
#> [1] 37  1  0  0
#> 
#> $full_results$`09-diff`$maximum_difference
#> [1] 29
#> 
#> 
#> $full_results$`09-repeat`
#> $full_results$`09-repeat`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-repeat`$repeats
#> # A tibble: 0 × 6
#> # ℹ 6 variables: point <dbl>, location <chr>, date <date>, horizon <chr>,
#> #   lower <dbl>, upper <dbl>
#> 
#> 
#> $full_results$`09-shape`
#> $full_results$`09-shape`$indicator
#> [1] TRUE
#> 
#> 
#> $full_results$`09-taper`
#> $full_results$`09-taper`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-taper`$widths
#> [1] 34 50 63 74
#> 
#> 
#> $full_results$`09-trend`
#> $full_results$`09-trend`$indicator
#> [1] FALSE
#> 
#> $full_results$`09-trend`$output
#> # A tibble: 20 × 7
#>    Location Index Date       Value Type     Changepoint Flagged
#>    <chr>    <int> <date>     <dbl> <chr>    <lgl>       <lgl>  
#>  1 09           1 2022-07-16     3 Observed FALSE       FALSE  
#>  2 09           2 2022-07-23     6 Observed FALSE       FALSE  
#>  3 09           3 2022-07-30     4 Observed FALSE       FALSE  
#>  4 09           4 2022-08-06     3 Observed FALSE       FALSE  
#>  5 09           5 2022-08-13     0 Observed FALSE       FALSE  
#>  6 09           6 2022-08-20     2 Observed FALSE       FALSE  
#>  7 09           7 2022-08-27     9 Observed FALSE       FALSE  
#>  8 09           8 2022-09-03     4 Observed FALSE       FALSE  
#>  9 09           9 2022-09-10     3 Observed FALSE       FALSE  
#> 10 09          10 2022-09-17     4 Observed FALSE       FALSE  
#> 11 09          11 2022-09-24     3 Observed FALSE       FALSE  
#> 12 09          12 2022-10-01     2 Observed FALSE       FALSE  
#> 13 09          13 2022-10-08     3 Observed FALSE       FALSE  
#> 14 09          14 2022-10-15     9 Observed FALSE       FALSE  
#> 15 09          15 2022-10-22     9 Observed TRUE        FALSE  
#> 16 09          16 2022-10-29    37 Observed FALSE       FALSE  
#> 17 09          17 2022-11-05    74 Forecast FALSE       FALSE  
#> 18 09          18 2022-11-12    75 Forecast FALSE       FALSE  
#> 19 09          19 2022-11-19    75 Forecast FALSE       FALSE  
#> 20 09          20 2022-11-26    75 Forecast FALSE       FALSE  
#> 
#> $full_results$`09-trend`$flagged_dates
#> [1] NA
#> 
#> 
#> $full_results$`09-zero`
#> $full_results$`09-zero`$indicator
#> [1] FALSE




######################################################
######################################################
# Run tests on old and new plane_score for a variety of possible types of weights:
######################################################
######################################################


######################################################
# Test with integers
######################################################
comps <- c("cover", "taper", "diff")
wts <- c("cover" = 2, "taper" = 1, "diff" = 4)

# With plane_score as is:
integer_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts)[["scores_summary"]])

# With new plane_score with error message for weights <= 0
integer_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts)[["scores_summary"]])

identical(integer_original, integer_new)
#> [1] TRUE
# This works as is, and there is no change with the new proposed plane_score




######################################################
# Test proportions that sum to 1
######################################################
wts2 <- wts/sum(wts)
sum(wts2)
#> [1] 1

# With plane_score as is:
props_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts2)[["scores_summary"]])

# With new plane_score with error message for weights <= 0
props_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts2)[["scores_summary"]])

identical(props_original, props_new)
#> [1] TRUE
# This works as is, and there is no change with the new proposed plane_score IFF we never use n_flags_weighted or weights_denominator output




######################################################
# Test standardized weights
######################################################
wts3 <- wts/mean(wts)
sum(wts3)
#> [1] 3

# With plane_score as is:
mean_scaled_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts3)[["scores_summary"]])

# With new plane_score with error message for weights <= 0
mean_scaled_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts3)[["scores_summary"]])

identical(mean_scaled_original, mean_scaled_new)
#> [1] TRUE
# This works as is, and there is no change with the new proposed plane_score IFF we never use n_flags_weighted or weights_denominator output




######################################################
# Test weights standardized around zero
######################################################
wts4 <- wts - mean(wts)
wts4
#>      cover      taper       diff 
#> -0.3333333 -1.3333333  1.6666667
round(mean(wts4), 1)
#> [1] 0
round(sum(wts4), 1)
#> [1] 0

# With plane_score as is:
centered_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts4)[["scores_summary"]])

# With new plane_score with error message for weights <= 0
centered_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts4)[["scores_summary"]])
#> Error in plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a positive number

# This did not work as is. Produced unexpected results. Might be fine if comparing apples to apples, but it makes the resulting score confusing. With the edited plane_score, it doesn't run and produces an error message.


results <- data.frame(integer_original, integer_new, props_original, props_new, mean_scaled_original, mean_scaled_new, centered_original, centered_new = "error message")
results
#>                         integer_original       integer_new    props_original
#> 09.location                           09                09                09
#> 09.n_flags                             2                 2                 2
#> 09.n_components                        3                 3                 3
#> 09.n_flags_weighted                    6                 6 0.857142857142857
#> 09.weights_denominator                 7                 7                 1
#> 09.score               0.857142857142857 0.857142857142857 0.857142857142857
#> 09.components           cover;diff;taper  cover;diff;taper  cover;diff;taper
#> 09.flagged                    cover;diff        cover;diff        cover;diff
#>                                props_new mean_scaled_original   mean_scaled_new
#> 09.location                           09                   09                09
#> 09.n_flags                             2                    2                 2
#> 09.n_components                        3                    3                 3
#> 09.n_flags_weighted    0.857142857142857     2.57142857142857  2.57142857142857
#> 09.weights_denominator                 1                    3                 3
#> 09.score               0.857142857142857    0.857142857142857 0.857142857142857
#> 09.components           cover;diff;taper     cover;diff;taper  cover;diff;taper
#> 09.flagged                    cover;diff           cover;diff        cover;diff
#>                            centered_original  centered_new
#> 09.location                               09 error message
#> 09.n_flags                                 2 error message
#> 09.n_components                            3 error message
#> 09.n_flags_weighted         1.33333333333333 error message
#> 09.weights_denominator -4.44089209850063e-16 error message
#> 09.score                   -3002399751580330 error message
#> 09.components               cover;diff;taper error message
#> 09.flagged                        cover;diff error message

Created on 2024-04-22 with reprex v2.0.2

@vpnagraj
Copy link
Collaborator

thanks @abenefield1 !

noting our follow up discussion for posterity ... yes, lets go with a constraint for weights >= 1

when youre done with that open a PR from the feature branch for this change to the v0.0.3 branch and i will review

@abenefield1
Copy link
Collaborator Author

@vpnagraj

Changed plane_score so weights must be >= 1. Also updated documentation and initiated a PR (#134).

Checked and there was no issue with integers vs reals

# Reprex of plane_score with updates
# plane_score has been changed so that weights must be >= 1
library(rplanes)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union




######################################################
######################################################
# Read in data to test
######################################################
######################################################
## read in example observed data and prep observed signal
hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) %>%
  filter(location == "09")

hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")
prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")

## read in example forecast and prep forecast signal
fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")
prepped_forecast <- read_forecast(fp) %>%
  filter(location == "09")%>%
  to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)

## prepare seed with cut date
prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")


######################################################
######################################################
# Run tests on new plane_score for a variety of possible 
# types of numbers specified for weights:
######################################################
######################################################


######################################################
# Test with integers
######################################################
comps <- c("cover", "taper", "diff")
int_wts <- c("cover" = 1, "taper" = 2, "diff" = 4)

integer <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = int_wts)[["scores_summary"]])
# This should work with new plane_score, and it does.




######################################################
# Test with reals
######################################################
real_wts <- int_wts*1.5
real_wts
#> cover taper  diff 
#>   1.5   3.0   6.0

real <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = real_wts)[["scores_summary"]])
# This should work with new plane_score, and it does.




######################################################
# Test proportions that sum to 1
######################################################
prop_wts <- int_wts/sum(int_wts)
prop_wts
#>     cover     taper      diff 
#> 0.1428571 0.2857143 0.5714286
sum(prop_wts)
#> [1] 1

props <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = prop_wts)[["scores_summary"]])
#> Error in plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a real number >= 1
# Correctly breaks with new plane_score(), because weights < 1




######################################################
# Test standardized (mean scaled) weights
######################################################
mean_scaled_wts <- int_wts/mean(int_wts)
sum(mean_scaled_wts)
#> [1] 3
mean_scaled_wts
#>     cover     taper      diff 
#> 0.4285714 0.8571429 1.7142857

mean_scaled <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = mean_scaled_wts)[["scores_summary"]])
#> Error in plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a real number >= 1
# Correctly breaks with new plane_score(), because some weights < 1




######################################################
# Test weights centered around zero
######################################################
zero_centered_wts <- int_wts - mean(int_wts)
zero_centered_wts
#>      cover      taper       diff 
#> -1.3333333 -0.3333333  1.6666667
round(mean(zero_centered_wts), 1)
#> [1] 0
round(sum(zero_centered_wts), 1)
#> [1] 0

# With plane_score as is:
zero_centered <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = zero_centered_wts)[["scores_summary"]])
#> Error in plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a real number >= 1
# Correctly breaks with new plane_score(), because some weights < 1




results <- data.frame(integer, real, props = "error",  mean_scaled = "error", zero_centered = "error")
results
#>                                  integer              real props mean_scaled
#> 09.location                           09                09 error       error
#> 09.n_flags                             2                 2 error       error
#> 09.n_components                        3                 3 error       error
#> 09.n_flags_weighted                    5               7.5 error       error
#> 09.weights_denominator                 7              10.5 error       error
#> 09.score               0.714285714285714 0.714285714285714 error       error
#> 09.components           cover;diff;taper  cover;diff;taper error       error
#> 09.flagged                    cover;diff        cover;diff error       error
#>                        zero_centered
#> 09.location                    error
#> 09.n_flags                     error
#> 09.n_components                error
#> 09.n_flags_weighted            error
#> 09.weights_denominator         error
#> 09.score                       error
#> 09.components                  error
#> 09.flagged                     error

Created on 2024-04-22 with reprex v2.0.2

@vpnagraj
Copy link
Collaborator

done and on the v0.0.3 branch (https://github.com/signaturescience/rplanes/tree/v0.0.3). will be merged to main with the v0.0.3 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants