Skip to content

Commit

Permalink
Try converting all columns to numerical type
Browse files Browse the repository at this point in the history
The dtype inference in augur.io.read_metadata does not support numerical
columns with empty values (because it calls pandas.read_csv with
na_filter=False¹). This gets around that limitation by converting
columns before querying.

I also considered infer_objects and convert_dtypes, but those are not
useful here since they only support soft (not hard) conversions².

¹ a1bfce4
² https://stackoverflow.com/a/60278450
  • Loading branch information
victorlin committed Jul 28, 2023
1 parent e914cab commit b325b97
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 7 deletions.
4 changes: 4 additions & 0 deletions augur/filter/include_exclude_rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,10 @@ def filter_by_query(metadata, query) -> FilterFunctionReturn:
set()
"""
# Try converting all columns to numeric.
for column in metadata.columns:
metadata[column] = pd.to_numeric(metadata[column], errors='ignore')

return set(metadata.query(query).index.values)


Expand Down
12 changes: 5 additions & 7 deletions tests/functional/filter/cram/filter-query-numerical.t
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,13 @@ Create metadata file for testing.
> SEQ_4
> ~~

Ideally, the 'coverage' column should be query-able by numerical comparisons.
This does not currently work since the empty string is causing that column to be
parsed as a non-numerical type.
The 'coverage' column should be query-able by numerical comparisons.

$ ${AUGUR} filter \
> --metadata metadata.tsv \
> --query "coverage >= 0.95" \
> --output-strains filtered_strains.txt > /dev/null
ERROR: Internal Pandas error when applying query:
'>=' not supported between instances of 'str' and 'float'
Ensure the syntax is valid per <https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-query>.
[2]

$ sort filtered_strains.txt
SEQ_2
SEQ_3

0 comments on commit b325b97

Please sign in to comment.