-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify weighting scheme documentation #126
Comments
@vpnagraj, waiting on answer to the question about if we use n_flags_weighted or weights_denominator in plane_score output to both clarify the documentation and/or add a constraint (depending on answer, weights either >0 or >1) inside of the plane_score() function. In the meantime, here's the reprex that shows the outputs. It should help up decide what we want to do :) # Reprex of plane_score weighting confusion.
######################################################
######################################################
# load rplanes and the original plane_score:
######################################################
######################################################
library(rplanes)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
######################################################
######################################################
# New plane_score:
######################################################
######################################################
# Proposed new plane_score that generates an error message for weights <= 0. May want to change this depending on answer about usin n_flags_weighted and weights_denominator
plane_score2 <- function(input, seed, components = "all", args = NULL, weights = NULL) {
## TODO: create this list as a built-in object?
## NOTE: consider using getFromNamespace to simplify this step
complist <-
list(cover = list(.function = plane_cover),
diff = list(.function = plane_diff),
taper = list(.function = plane_taper),
`repeat` = list(.function = plane_repeat),
trend = list(.function = plane_trend),
shape = list(.function = plane_shape),
zero = list(.function = plane_zero)
)
## verify components for signal type ... some won't apply to observed
allowed_observed <- c("repeat","diff","zero")
## handle condition when "all" components are requested
## observed data will only have a subset (the allowed compoments above)
if(length(components) == 1 && components == "all") {
if(is_observed(input)) {
components <- allowed_observed
} else {
components <- names(complist)
}
} else {
## now handle case when components have been defined as a character vector (not as "all")
if(is_observed(input)) {
if(any(!components %in% allowed_observed)) {
warning(paste0("Only the following components are allowed for observed signals: ", paste0(allowed_observed, collapse = ";")))
}
components <- components[components %in% allowed_observed]
## check if components vector is empty and if so stop
if(length(components) == 0) {
stop("The signal is observed but none of the allowed components for observed signals were specified.")
}
}
}
## get all possible locations
## TODO: make this easier to access in prepped signal by adding locations element
locs <- unique(input$data$location)
## create combinations of locations and components for mapping below
to_map <- tidyr::crossing(locs = locs, comps = components)
## TODO: better way do this mapping and tracking of location / components by name
## NOTE: the !!!args[[.x]] will look for the name of the component in the list of args ...
## ... if it is not there (e.g., args = NULL) then the function will proceed with no additional args ...
## ... if it is there then the !!! will splice the named arguments passed in a list and apply them
retl <-
purrr::map2(to_map$locs, to_map$comps, ~ purrr::exec(complist[[.y]]$.function, location = .x, input = input, seed = seed, !!!args[[.y]])) %>%
purrr::set_names(paste0(to_map$locs, "-", to_map$comps))
## grab full list of component results
full_results <- purrr::map(retl, purrr::pluck)
## separator regex
## this should create something like "-(?=[diff|trend|taper])"
## when applied below that will split *only on hyphens that are followed by diff or trend or taper
sep_rx <- paste0("-(?=[", paste0(components,collapse = "|"), "])")
## pull out summary tibble components and locations from the returned list above
loc_tbl <-
dplyr::tibble(loc_component = names(retl), indicator = purrr::map_lgl(retl, "indicator")) %>%
tidyr::separate(.data$loc_component, into = c("location", "component"), sep = sep_rx)
which_flags <-
loc_tbl %>%
dplyr::group_by(.data$location) %>%
dplyr::filter(.data$indicator) %>%
dplyr::summarise(flagged = paste0(.data$component, collapse = ";"))
## construct a tibble with weights for components
## if the weights argument is NULL then apply equal weights to all components
if(is.null(weights)) {
weights_tbl <- dplyr::tibble(component = components, weight = 1)
} else {
if(any(weights <= 0)) {
stop("Weights must be a positive number")
}
if(!all(sort(names(weights)) == sort(components))) {
stop("Weights must be provided as a vector with all components used included by name (e.g., c('diff' = 4, 'cover' = 1))")
}
weights_tbl <- dplyr::tibble(component = names(weights), weight = weights)
}
## convert the tibble into a list
loc_list <-
loc_tbl %>%
## join to weights tbl defined above
dplyr::left_join(weights_tbl, by = "component") %>%
## count number of flags (numerator for score)
## count number of components (denominator for score)
## convert to score
## hold onto the names of the components used
dplyr::group_by(.data$location) %>%
dplyr::summarise(n_flags = sum(.data$indicator),
n_components = dplyr::n(),
n_flags_weighted = sum(.data$indicator * .data$weight),
weights_denominator = sum(.data$weight),
score = .data$n_flags_weighted / .data$weights_denominator,
components = paste0(.data$component, collapse = ";")
) %>%
## join back to tibble that enumerates which components were flagged
dplyr::left_join(which_flags, by = "location") %>%
## split into a list by location
dplyr::group_split(.data$location, .keep = TRUE) %>%
## use the location column from the tibble in each split list element as name
purrr::set_names(purrr::map_chr(., "location")) %>%
## make sure it is a list of lists (not a list of tibbles)
purrr::map(., as.list)
return(list(scores_summary = loc_list, scores_raw = loc_tbl, full_results = full_results))
}
######################################################
######################################################
# Read in data to test
######################################################
######################################################
## read in example observed data and prep observed signal
hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) %>%
filter(location == "09")
hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")
prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")
## read in example forecast and prep forecast signal
fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")
prepped_forecast <- read_forecast(fp) %>%
filter(location == "09")%>%
to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)
## prepare seed with cut date
prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")
## run plane scoring with all components
plane_score(input = prepped_forecast, seed = prepped_seed)
#> $scores_summary
#> $scores_summary$`09`
#> $scores_summary$`09`$location
#> [1] "09"
#>
#> $scores_summary$`09`$n_flags
#> [1] 4
#>
#> $scores_summary$`09`$n_components
#> [1] 7
#>
#> $scores_summary$`09`$n_flags_weighted
#> [1] 4
#>
#> $scores_summary$`09`$weights_denominator
#> [1] 7
#>
#> $scores_summary$`09`$score
#> [1] 0.5714286
#>
#> $scores_summary$`09`$components
#> [1] "cover;diff;repeat;shape;taper;trend;zero"
#>
#> $scores_summary$`09`$flagged
#> [1] "cover;diff;repeat;shape"
#>
#>
#>
#> $scores_raw
#> # A tibble: 7 × 3
#> location component indicator
#> <chr> <chr> <lgl>
#> 1 09 cover TRUE
#> 2 09 diff TRUE
#> 3 09 repeat TRUE
#> 4 09 shape TRUE
#> 5 09 taper FALSE
#> 6 09 trend FALSE
#> 7 09 zero FALSE
#>
#> $full_results
#> $full_results$`09-cover`
#> $full_results$`09-cover`$indicator
#> [1] TRUE
#>
#> $full_results$`09-cover`$last_value
#> [1] 37
#>
#> $full_results$`09-cover`$bounds
#> $full_results$`09-cover`$bounds$lower
#> [1] 57
#>
#> $full_results$`09-cover`$bounds$upper
#> [1] 91
#>
#>
#>
#> $full_results$`09-diff`
#> $full_results$`09-diff`$indicator
#> [1] TRUE
#>
#> $full_results$`09-diff`$values
#> [1] 37 74 75 75 75
#>
#> $full_results$`09-diff`$evaluated_differences
#> [1] 37 1 0 0
#>
#> $full_results$`09-diff`$maximum_difference
#> [1] 29
#>
#>
#> $full_results$`09-repeat`
#> $full_results$`09-repeat`$indicator
#> [1] TRUE
#>
#> $full_results$`09-repeat`$repeats
#> # A tibble: 3 × 6
#> point location date horizon lower upper
#> <dbl> <chr> <date> <chr> <dbl> <dbl>
#> 1 75 09 2022-11-12 2 50 100
#> 2 75 09 2022-11-19 3 43 106
#> 3 75 09 2022-11-26 4 38 112
#>
#>
#> $full_results$`09-shape`
#> $full_results$`09-shape`$indicator
#> [1] TRUE
#>
#>
#> $full_results$`09-taper`
#> $full_results$`09-taper`$indicator
#> [1] FALSE
#>
#> $full_results$`09-taper`$widths
#> [1] 34 50 63 74
#>
#>
#> $full_results$`09-trend`
#> $full_results$`09-trend`$indicator
#> [1] FALSE
#>
#> $full_results$`09-trend`$output
#> # A tibble: 20 × 7
#> Location Index Date Value Type Changepoint Flagged
#> <chr> <int> <date> <dbl> <chr> <lgl> <lgl>
#> 1 09 1 2022-07-16 3 Observed FALSE FALSE
#> 2 09 2 2022-07-23 6 Observed FALSE FALSE
#> 3 09 3 2022-07-30 4 Observed FALSE FALSE
#> 4 09 4 2022-08-06 3 Observed FALSE FALSE
#> 5 09 5 2022-08-13 0 Observed FALSE FALSE
#> 6 09 6 2022-08-20 2 Observed FALSE FALSE
#> 7 09 7 2022-08-27 9 Observed FALSE FALSE
#> 8 09 8 2022-09-03 4 Observed FALSE FALSE
#> 9 09 9 2022-09-10 3 Observed FALSE FALSE
#> 10 09 10 2022-09-17 4 Observed FALSE FALSE
#> 11 09 11 2022-09-24 3 Observed FALSE FALSE
#> 12 09 12 2022-10-01 2 Observed FALSE FALSE
#> 13 09 13 2022-10-08 3 Observed FALSE FALSE
#> 14 09 14 2022-10-15 9 Observed FALSE FALSE
#> 15 09 15 2022-10-22 9 Observed TRUE FALSE
#> 16 09 16 2022-10-29 37 Observed FALSE FALSE
#> 17 09 17 2022-11-05 74 Forecast FALSE FALSE
#> 18 09 18 2022-11-12 75 Forecast FALSE FALSE
#> 19 09 19 2022-11-19 75 Forecast FALSE FALSE
#> 20 09 20 2022-11-26 75 Forecast FALSE FALSE
#>
#> $full_results$`09-trend`$flagged_dates
#> [1] NA
#>
#>
#> $full_results$`09-zero`
#> $full_results$`09-zero`$indicator
#> [1] FALSE
## run plane scoring with select components
plane_score(input = prepped_forecast, seed = prepped_seed, components = c("cover","taper"))
#> $scores_summary
#> $scores_summary$`09`
#> $scores_summary$`09`$location
#> [1] "09"
#>
#> $scores_summary$`09`$n_flags
#> [1] 1
#>
#> $scores_summary$`09`$n_components
#> [1] 2
#>
#> $scores_summary$`09`$n_flags_weighted
#> [1] 1
#>
#> $scores_summary$`09`$weights_denominator
#> [1] 2
#>
#> $scores_summary$`09`$score
#> [1] 0.5
#>
#> $scores_summary$`09`$components
#> [1] "cover;taper"
#>
#> $scores_summary$`09`$flagged
#> [1] "cover"
#>
#>
#>
#> $scores_raw
#> # A tibble: 2 × 3
#> location component indicator
#> <chr> <chr> <lgl>
#> 1 09 cover TRUE
#> 2 09 taper FALSE
#>
#> $full_results
#> $full_results$`09-cover`
#> $full_results$`09-cover`$indicator
#> [1] TRUE
#>
#> $full_results$`09-cover`$last_value
#> [1] 37
#>
#> $full_results$`09-cover`$bounds
#> $full_results$`09-cover`$bounds$lower
#> [1] 57
#>
#> $full_results$`09-cover`$bounds$upper
#> [1] 91
#>
#>
#>
#> $full_results$`09-taper`
#> $full_results$`09-taper`$indicator
#> [1] FALSE
#>
#> $full_results$`09-taper`$widths
#> [1] 34 50 63 74
## run plane scoring with all components and additional args
comp_args <- list("trend" = list("sig_lvl" = 0.05), "repeat" = list("prepend" = 4, "tolerance" = 8))
plane_score(input = prepped_forecast, seed = prepped_seed, args = comp_args)
#> $scores_summary
#> $scores_summary$`09`
#> $scores_summary$`09`$location
#> [1] "09"
#>
#> $scores_summary$`09`$n_flags
#> [1] 3
#>
#> $scores_summary$`09`$n_components
#> [1] 7
#>
#> $scores_summary$`09`$n_flags_weighted
#> [1] 3
#>
#> $scores_summary$`09`$weights_denominator
#> [1] 7
#>
#> $scores_summary$`09`$score
#> [1] 0.4285714
#>
#> $scores_summary$`09`$components
#> [1] "cover;diff;repeat;shape;taper;trend;zero"
#>
#> $scores_summary$`09`$flagged
#> [1] "cover;diff;shape"
#>
#>
#>
#> $scores_raw
#> # A tibble: 7 × 3
#> location component indicator
#> <chr> <chr> <lgl>
#> 1 09 cover TRUE
#> 2 09 diff TRUE
#> 3 09 repeat FALSE
#> 4 09 shape TRUE
#> 5 09 taper FALSE
#> 6 09 trend FALSE
#> 7 09 zero FALSE
#>
#> $full_results
#> $full_results$`09-cover`
#> $full_results$`09-cover`$indicator
#> [1] TRUE
#>
#> $full_results$`09-cover`$last_value
#> [1] 37
#>
#> $full_results$`09-cover`$bounds
#> $full_results$`09-cover`$bounds$lower
#> [1] 57
#>
#> $full_results$`09-cover`$bounds$upper
#> [1] 91
#>
#>
#>
#> $full_results$`09-diff`
#> $full_results$`09-diff`$indicator
#> [1] TRUE
#>
#> $full_results$`09-diff`$values
#> [1] 37 74 75 75 75
#>
#> $full_results$`09-diff`$evaluated_differences
#> [1] 37 1 0 0
#>
#> $full_results$`09-diff`$maximum_difference
#> [1] 29
#>
#>
#> $full_results$`09-repeat`
#> $full_results$`09-repeat`$indicator
#> [1] FALSE
#>
#> $full_results$`09-repeat`$repeats
#> # A tibble: 0 × 6
#> # ℹ 6 variables: point <dbl>, location <chr>, date <date>, horizon <chr>,
#> # lower <dbl>, upper <dbl>
#>
#>
#> $full_results$`09-shape`
#> $full_results$`09-shape`$indicator
#> [1] TRUE
#>
#>
#> $full_results$`09-taper`
#> $full_results$`09-taper`$indicator
#> [1] FALSE
#>
#> $full_results$`09-taper`$widths
#> [1] 34 50 63 74
#>
#>
#> $full_results$`09-trend`
#> $full_results$`09-trend`$indicator
#> [1] FALSE
#>
#> $full_results$`09-trend`$output
#> # A tibble: 20 × 7
#> Location Index Date Value Type Changepoint Flagged
#> <chr> <int> <date> <dbl> <chr> <lgl> <lgl>
#> 1 09 1 2022-07-16 3 Observed FALSE FALSE
#> 2 09 2 2022-07-23 6 Observed FALSE FALSE
#> 3 09 3 2022-07-30 4 Observed FALSE FALSE
#> 4 09 4 2022-08-06 3 Observed FALSE FALSE
#> 5 09 5 2022-08-13 0 Observed FALSE FALSE
#> 6 09 6 2022-08-20 2 Observed FALSE FALSE
#> 7 09 7 2022-08-27 9 Observed FALSE FALSE
#> 8 09 8 2022-09-03 4 Observed FALSE FALSE
#> 9 09 9 2022-09-10 3 Observed FALSE FALSE
#> 10 09 10 2022-09-17 4 Observed FALSE FALSE
#> 11 09 11 2022-09-24 3 Observed FALSE FALSE
#> 12 09 12 2022-10-01 2 Observed FALSE FALSE
#> 13 09 13 2022-10-08 3 Observed FALSE FALSE
#> 14 09 14 2022-10-15 9 Observed FALSE FALSE
#> 15 09 15 2022-10-22 9 Observed TRUE FALSE
#> 16 09 16 2022-10-29 37 Observed FALSE FALSE
#> 17 09 17 2022-11-05 74 Forecast FALSE FALSE
#> 18 09 18 2022-11-12 75 Forecast FALSE FALSE
#> 19 09 19 2022-11-19 75 Forecast FALSE FALSE
#> 20 09 20 2022-11-26 75 Forecast FALSE FALSE
#>
#> $full_results$`09-trend`$flagged_dates
#> [1] NA
#>
#>
#> $full_results$`09-zero`
#> $full_results$`09-zero`$indicator
#> [1] FALSE
######################################################
######################################################
# Run tests on old and new plane_score for a variety of possible types of weights:
######################################################
######################################################
######################################################
# Test with integers
######################################################
comps <- c("cover", "taper", "diff")
wts <- c("cover" = 2, "taper" = 1, "diff" = 4)
# With plane_score as is:
integer_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts)[["scores_summary"]])
# With new plane_score with error message for weights <= 0
integer_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts)[["scores_summary"]])
identical(integer_original, integer_new)
#> [1] TRUE
# This works as is, and there is no change with the new proposed plane_score
######################################################
# Test proportions that sum to 1
######################################################
wts2 <- wts/sum(wts)
sum(wts2)
#> [1] 1
# With plane_score as is:
props_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts2)[["scores_summary"]])
# With new plane_score with error message for weights <= 0
props_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts2)[["scores_summary"]])
identical(props_original, props_new)
#> [1] TRUE
# This works as is, and there is no change with the new proposed plane_score IFF we never use n_flags_weighted or weights_denominator output
######################################################
# Test standardized weights
######################################################
wts3 <- wts/mean(wts)
sum(wts3)
#> [1] 3
# With plane_score as is:
mean_scaled_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts3)[["scores_summary"]])
# With new plane_score with error message for weights <= 0
mean_scaled_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts3)[["scores_summary"]])
identical(mean_scaled_original, mean_scaled_new)
#> [1] TRUE
# This works as is, and there is no change with the new proposed plane_score IFF we never use n_flags_weighted or weights_denominator output
######################################################
# Test weights standardized around zero
######################################################
wts4 <- wts - mean(wts)
wts4
#> cover taper diff
#> -0.3333333 -1.3333333 1.6666667
round(mean(wts4), 1)
#> [1] 0
round(sum(wts4), 1)
#> [1] 0
# With plane_score as is:
centered_original <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts4)[["scores_summary"]])
# With new plane_score with error message for weights <= 0
centered_new <- unlist(plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts4)[["scores_summary"]])
#> Error in plane_score2(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a positive number
# This did not work as is. Produced unexpected results. Might be fine if comparing apples to apples, but it makes the resulting score confusing. With the edited plane_score, it doesn't run and produces an error message.
results <- data.frame(integer_original, integer_new, props_original, props_new, mean_scaled_original, mean_scaled_new, centered_original, centered_new = "error message")
results
#> integer_original integer_new props_original
#> 09.location 09 09 09
#> 09.n_flags 2 2 2
#> 09.n_components 3 3 3
#> 09.n_flags_weighted 6 6 0.857142857142857
#> 09.weights_denominator 7 7 1
#> 09.score 0.857142857142857 0.857142857142857 0.857142857142857
#> 09.components cover;diff;taper cover;diff;taper cover;diff;taper
#> 09.flagged cover;diff cover;diff cover;diff
#> props_new mean_scaled_original mean_scaled_new
#> 09.location 09 09 09
#> 09.n_flags 2 2 2
#> 09.n_components 3 3 3
#> 09.n_flags_weighted 0.857142857142857 2.57142857142857 2.57142857142857
#> 09.weights_denominator 1 3 3
#> 09.score 0.857142857142857 0.857142857142857 0.857142857142857
#> 09.components cover;diff;taper cover;diff;taper cover;diff;taper
#> 09.flagged cover;diff cover;diff cover;diff
#> centered_original centered_new
#> 09.location 09 error message
#> 09.n_flags 2 error message
#> 09.n_components 3 error message
#> 09.n_flags_weighted 1.33333333333333 error message
#> 09.weights_denominator -4.44089209850063e-16 error message
#> 09.score -3002399751580330 error message
#> 09.components cover;diff;taper error message
#> 09.flagged cover;diff error message Created on 2024-04-22 with reprex v2.0.2 |
thanks @abenefield1 ! noting our follow up discussion for posterity ... yes, lets go with a constraint for weights >= 1 when youre done with that open a PR from the feature branch for this change to the v0.0.3 branch and i will review |
Changed plane_score so weights must be >= 1. Also updated documentation and initiated a PR (#134). Checked and there was no issue with integers vs reals # Reprex of plane_score with updates
# plane_score has been changed so that weights must be >= 1
library(rplanes)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
######################################################
######################################################
# Read in data to test
######################################################
######################################################
## read in example observed data and prep observed signal
hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) %>%
filter(location == "09")
hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")
prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")
## read in example forecast and prep forecast signal
fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")
prepped_forecast <- read_forecast(fp) %>%
filter(location == "09")%>%
to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)
## prepare seed with cut date
prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")
######################################################
######################################################
# Run tests on new plane_score for a variety of possible
# types of numbers specified for weights:
######################################################
######################################################
######################################################
# Test with integers
######################################################
comps <- c("cover", "taper", "diff")
int_wts <- c("cover" = 1, "taper" = 2, "diff" = 4)
integer <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = int_wts)[["scores_summary"]])
# This should work with new plane_score, and it does.
######################################################
# Test with reals
######################################################
real_wts <- int_wts*1.5
real_wts
#> cover taper diff
#> 1.5 3.0 6.0
real <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = real_wts)[["scores_summary"]])
# This should work with new plane_score, and it does.
######################################################
# Test proportions that sum to 1
######################################################
prop_wts <- int_wts/sum(int_wts)
prop_wts
#> cover taper diff
#> 0.1428571 0.2857143 0.5714286
sum(prop_wts)
#> [1] 1
props <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = prop_wts)[["scores_summary"]])
#> Error in plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a real number >= 1
# Correctly breaks with new plane_score(), because weights < 1
######################################################
# Test standardized (mean scaled) weights
######################################################
mean_scaled_wts <- int_wts/mean(int_wts)
sum(mean_scaled_wts)
#> [1] 3
mean_scaled_wts
#> cover taper diff
#> 0.4285714 0.8571429 1.7142857
mean_scaled <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = mean_scaled_wts)[["scores_summary"]])
#> Error in plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a real number >= 1
# Correctly breaks with new plane_score(), because some weights < 1
######################################################
# Test weights centered around zero
######################################################
zero_centered_wts <- int_wts - mean(int_wts)
zero_centered_wts
#> cover taper diff
#> -1.3333333 -0.3333333 1.6666667
round(mean(zero_centered_wts), 1)
#> [1] 0
round(sum(zero_centered_wts), 1)
#> [1] 0
# With plane_score as is:
zero_centered <- unlist(plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = zero_centered_wts)[["scores_summary"]])
#> Error in plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, : Weights must be a real number >= 1
# Correctly breaks with new plane_score(), because some weights < 1
results <- data.frame(integer, real, props = "error", mean_scaled = "error", zero_centered = "error")
results
#> integer real props mean_scaled
#> 09.location 09 09 error error
#> 09.n_flags 2 2 error error
#> 09.n_components 3 3 error error
#> 09.n_flags_weighted 5 7.5 error error
#> 09.weights_denominator 7 10.5 error error
#> 09.score 0.714285714285714 0.714285714285714 error error
#> 09.components cover;diff;taper cover;diff;taper error error
#> 09.flagged cover;diff cover;diff error error
#> zero_centered
#> 09.location error
#> 09.n_flags error
#> 09.n_components error
#> 09.n_flags_weighted error
#> 09.weights_denominator error
#> 09.score error
#> 09.components error
#> 09.flagged error Created on 2024-04-22 with reprex v2.0.2 |
done and on the v0.0.3 branch (https://github.com/signaturescience/rplanes/tree/v0.0.3). will be merged to main with the v0.0.3 release. |
@vpnagraj , the way we currently have the plane_score weighting documented is a little unclear. It doesn't matter too much, but I think we can improve it. There's no indication in the documentation about what scale the weights need to be on. The way it's coded, it doesn't actually matter as long as the weights are >1, but it might save someone some time if we make that a little clearer. For example, I thought that the weighting scheme was on a scale of 0:1, which produces different results.
Also, it seems like someone could weigh something >>> 1. Which is fine I guess, but we might want to set bounds.
With the weights as they are currently, it's not super intuitive when we talk about "reducing the weights of some components," because the only way to effectively do that is to raise the weights of others.
The text was updated successfully, but these errors were encountered: