Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-processing for H models #287

Merged
merged 80 commits into from
Feb 7, 2025
Merged

Post-processing for H models #287

merged 80 commits into from
Feb 7, 2025

Conversation

damonbayer
Copy link
Collaborator

@damonbayer damonbayer commented Jan 8, 2025

Unfortunately, this has become a bit of monster PR, but it will result in a lot more consistency, clarity, and adaptability in the project.

Done

  • Improves formatting of data files - no longer uses confusing names like "Disease" and "Other", instead uses more descriptive names like "observed_ed_visits" and "other_ed_visits" in a tidy format.
  • Removes generation of legacy-formatted data files.
  • Simplifies and improves output of timeseries models to more closely match PyRenew models
  • Introduces group_time_index_to_date for robust PyRenew-index to date conversions.
  • Introduces parse_pyrenew_model_name to extract expected features (h, e, w), based on the model's name
  • Generalizes post-processing to work with models featuring any combination of h and e.
  • Changes the offset argument in model scoring to be 1, rather than 1 / max_visits, since max_visits is less well-defined when working with multiple targets.
  • There is now just a single scored.rds per model_run_dir. Different models and resolutions are indicated by their respective columns in these tables.
  • Fixes a bug in collecting eval data for hospital admissions d10dcfb
  • Mega epiweekly hubverse table is created, but I did not implement any post-processing functions for it.
  • Simplifies post-processing to remove redundant data generation.
  • Updates all parts of pipeline to work with the new formatted data files
  • Major rewrite of plot collation script
  • No R CMD check complaints
  • Update hewr tests

Out of Scope

  • Dashboard expects E models only.
  • Notions of "daily" vs "epiweekly" data are not necessarily correct throughout. Generally, "daily" means "unaggregated" and "epiweekly" means "aggregated."
  • Some aspects of the batch post-processing should be re-worked as needed to accommodate some concept of a "preferred" forecast, which may be a mixture of different models for different locations. For now, we generate many disease_category_pointintervals plots, one for each model. We may wish to instead make a single plot for the "preferred" forecast.

Closes

Closes #308
Closes #296

Copy link

codecov bot commented Jan 8, 2025

Codecov Report

Attention: Patch coverage is 25.49505% with 301 lines in your changes missing coverage. Please review.

Project coverage is 24.47%. Comparing base (3a0a5bf) to head (792d3b1).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
hewr/R/process_state_forecast.R 5.00% 209 Missing ⚠️
pipelines/collate_plots.py 0.00% 34 Missing ⚠️
hewr/R/make_forecast_figure.R 0.00% 25 Missing ⚠️
hewr/R/directory_utils.R 15.38% 22 Missing ⚠️
pipelines/postprocess_forecast_batches.py 0.00% 7 Missing ⚠️
pipelines/prep_data.py 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #287      +/-   ##
==========================================
- Coverage   25.56%   24.47%   -1.10%     
==========================================
  Files          22       22              
  Lines        1682     1704      +22     
==========================================
- Hits          430      417      -13     
- Misses       1252     1287      +35     
Flag Coverage Δ
hewr 41.68% <28.69%> (-7.94%) ⬇️
pipelines 4.67% <0.00%> (+0.32%) ⬆️
pyrenew_hew 27.96% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@dylanhmorris dylanhmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, @damonbayer. All my comments are addressed. Let me know when remaining test issues are fixed.

@damonbayer damonbayer marked this pull request as ready for review February 7, 2025 15:10
@damonbayer damonbayer requested a review from sbidari as a code owner February 7, 2025 15:10
@damonbayer damonbayer changed the title (DRAFT) Post-processing for H models Post-processing for H models Feb 7, 2025
@dylanhmorris
Copy link
Contributor

Going to try an end-to-end run now.

@dylanhmorris
Copy link
Contributor

Fitting and postprocessing run end to end! Thank you, @damonbayer.

Two things before merge:

  1. Afaict everything looks as expected, except that the titles on the prop_disease_ed_visits plots are wrong. They say "Other Emergency Department Visits". Presumably a bug in the figure labeling logic somewhere. I think this is worth fixing before merge, as it should be a quick fix.
  2. The mega hubverse table is big enough that I don't think it's going to be readily human-readable, and any submitted hubverse tables will be downsampled from it. Given that, worth saving it as parquet rather than .tsv?

@damonbayer
Copy link
Collaborator Author

@dylanhmorris I've addressed both of your most recent comments.

Copy link
Contributor

@dylanhmorris dylanhmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for all your hard work, @damonbayer!

@dylanhmorris dylanhmorris enabled auto-merge (squash) February 7, 2025 23:52
@dylanhmorris dylanhmorris merged commit 216a81b into main Feb 7, 2025
14 of 15 checks passed
@dylanhmorris dylanhmorris deleted the dmb_multi_postprocess branch February 7, 2025 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants