Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter: Explicitly drop date/year/month columns from metadata during grouping #967

Conversation

victorlin
Copy link
Member

Description of proposed changes

  • date column should not be used for grouping - use generated columns instead.
  • Any year/month columns in original metadata should be overridden by generated columns. Without dropping explicitly, a cryptic pandas ValueError occurs.
  • Clarify help string for --group-by

Related issue(s)

Testing

Added a functional test.

@victorlin victorlin requested a review from a team June 8, 2022 00:33
@victorlin victorlin self-assigned this Jun 8, 2022
@codecov
Copy link

codecov bot commented Jun 8, 2022

Codecov Report

Merging #967 (0e3f122) into master (a51bf3f) will decrease coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #967      +/-   ##
==========================================
- Coverage   59.32%   59.26%   -0.06%     
==========================================
  Files          50       50              
  Lines        6259     6265       +6     
  Branches     1585     1588       +3     
==========================================
  Hits         3713     3713              
- Misses       2285     2290       +5     
- Partials      261      262       +1     
Impacted Files Coverage Δ
augur/filter.py 96.14% <100.00%> (+0.04%) ⬆️
augur/ancestral.py 66.37% <0.00%> (-5.18%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a51bf3f...0e3f122. Read the comment docs.

Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew, that was a lot of context to catch up on in Slack and the related issue. It makes sense to me to only support grouping by year/month that is generated from the 'date' column 👍

I left some suggestions on the help and warning messages to make this behavior clearer to the users. Also, should there be a separate test for custom 'month' column since this is technically handled separately than the 'year' column?

@victorlin victorlin force-pushed the filter/clarify-generated-date-column-behavior branch 2 times, most recently from fd3fd12 to 1f53b63 Compare June 30, 2022 00:17
victorlin and others added 2 commits June 29, 2022 17:18
- `date` column should not be used for grouping - use generated columns instead.
- Any `year`/`month` columns in original metadata should be overridden by generated columns. Without dropping explicitly, a cryptic pandas ValueError occurs.

Co-authored-by: Jover Lee <joverlee521@gmail.com>
Co-authored-by: Jover Lee <joverlee521@gmail.com>
@victorlin victorlin force-pushed the filter/clarify-generated-date-column-behavior branch from 1f53b63 to 0e3f122 Compare June 30, 2022 00:18
@victorlin victorlin requested a review from joverlee521 June 30, 2022 00:22
@victorlin victorlin merged commit d431b72 into nextstrain:master Jun 30, 2022
@victorlin victorlin deleted the filter/clarify-generated-date-column-behavior branch June 30, 2022 17:21
@victorlin victorlin added this to the Next release X.X.X milestone Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

BUG: augur filter does not allow year column in metadata, crashes with ValueError
2 participants