-
Notifications
You must be signed in to change notification settings - Fork 130
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Explicitly drop date/year/month columns from metadata during grouping
- `date` column should not be used for grouping - use generated columns instead. - Any `year`/`month` columns in original metadata should be overridden by generated columns. Without dropping explicitly, a cryptic pandas ValueError occurs.
- Loading branch information
Showing
2 changed files
with
38 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
30 changes: 30 additions & 0 deletions
30
tests/functional/filter/cram/subsample-group-by-with-custom-year-column.t
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
Setup | ||
|
||
$ pushd "$TESTDIR" > /dev/null | ||
$ source _setup.sh | ||
|
||
Create a metadata file with a custom year column | ||
|
||
$ cat >$TMP/metadata-year-column.tsv <<~~ | ||
> strain date year | ||
> SEQ1 2021-01-01 odd | ||
> SEQ2 2021-01-02 odd | ||
> SEQ3 2022-01-01 even | ||
> SEQ4 2022-01-02 even | ||
> SEQ5 2022-02-02 even | ||
> ~~ | ||
|
||
Filter by generated date columns, and ensure the custom year column is still in the final output. | ||
|
||
$ ${AUGUR} filter \ | ||
> --metadata $TMP/metadata-year-column.tsv \ | ||
> --group-by year month \ | ||
> --sequences-per-group 1 \ | ||
> --subsample-seed 0 \ | ||
> --output-metadata "$TMP/filtered_metadata.tsv" > /dev/null | ||
WARNING: For --group-by purposes, the 'year' column in metadata will be overridden by the generated value from 'date' column. | ||
$ cat "$TMP/filtered_metadata.tsv" | ||
strain\tdate\tyear (esc) | ||
SEQ1\t2021-01-01\todd (esc) | ||
SEQ3\t2022-01-01\teven (esc) | ||
SEQ5\t2022-02-02\teven (esc) |