Skip to content

Commit

Permalink
curate formate-dates: Mask failed date strings
Browse files Browse the repository at this point in the history
By default, completely mask date strings with "XXXX-XX-XX" for dates
that failed date formatting so that they are still in the proper
ISO 8601 dates format for downstream Augur commands. Users can turn off
the masking with the added `--no-mask-failure` option.

Note the `store_false` actions produce semi-misleading docs in the help
output, so the new option suppresses the default value in the help message
with `SKIP_AUTO_DEFAULT_IN_HELP` as suggested by @tsibley in review.
  • Loading branch information
joverlee521 committed Jul 7, 2023
1 parent 0ca4896 commit 4d02220
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 3 deletions.
14 changes: 14 additions & 0 deletions augur/argparse_.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@
from argparse import Action, ArgumentDefaultsHelpFormatter


# Include this in an argument help string to suppress the automatic appending
# of the default value by argparse.ArgumentDefaultsHelpFormatter. This works
# because the automatic appending is conditional on the presence of %(default),
# so we include it but then format it as a zero-length string .0s. 🙃
#
# Another solution would be to add an extra attribute to the argument (the
# argparse.Action instance) and then subclass ArgumentDefaultsHelpFormatter to
# condition on that new attribute, but that seems more brittle.
#
# Copied from the Nextstrain CLI repo
# https://github.com/nextstrain/cli/blob/017c53805e8317951327d24c04184615cc400b09/nextstrain/cli/argparse.py#L13-L21
SKIP_AUTO_DEFAULT_IN_HELP = "%(default).0s"


def add_default_command(parser):
"""
Sets the default command to run when none is provided.
Expand Down
9 changes: 9 additions & 0 deletions augur/curate/format_dates.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import re
from datetime import datetime

from augur.argparse_ import SKIP_AUTO_DEFAULT_IN_HELP
from augur.errors import AugurError
from augur.io.print import print_err
from augur.types import DataErrorMethod
Expand All @@ -31,6 +32,10 @@ def register_parser(parent_subparsers):
choices=list(DataErrorMethod),
default=DataErrorMethod.ERROR_FIRST,
help="How should failed date formatting be reported.")
optional.add_argument("--no-mask-failure", dest="mask_failure",
action="store_false",
help="Do not mask dates with 'XXXX-XX-XX' and return original date string if date formatting failed. " +
f"(default: False{SKIP_AUTO_DEFAULT_IN_HELP})")

return parser

Expand Down Expand Up @@ -169,6 +174,10 @@ def run(args, records):

formatted_date_string = format_date(date_string, args.expected_date_formats)
if formatted_date_string is None:
# Mask failed date formatting before processing error methods
# to ensure failures are masked even when failures are "silent"
if args.mask_failure:
record[field] = "XXXX-XX-XX"

if failure_reporting is DataErrorMethod.SILENT:
continue
Expand Down
17 changes: 14 additions & 3 deletions tests/functional/curate/cram/format_dates.t
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ This is expected to fail with an error, so redirecting stdout since we don't car
[2]

Test output with unmatched expected date formats while warning on failures.
This is expected to print warnings for failures and return the date strings in their original format.
This is expected to print warnings for failures and return the masked date strings for failures.

$ cat $TMP/records.ndjson \
> | ${AUGUR} curate format-dates \
Expand All @@ -53,16 +53,27 @@ This is expected to print warnings for failures and return the date strings in t
WARNING: Unable to format dates for the following (record, field, date string):
(0, 'collectionDate', '2020-01')
(0, 'releaseDate', '2020-01')
{"record": 1, "date": "2020-XX-XX", "collectionDate": "2020-01", "releaseDate": "2020-01", "updateDate": "2020-07-18"}
{"record": 1, "date": "2020-XX-XX", "collectionDate": "XXXX-XX-XX", "releaseDate": "XXXX-XX-XX", "updateDate": "2020-07-18"}

Test output with unmatched expected date formats while silencing failures.
This is expected to return the date strings in their original format.
This is expected to return the masked date strings for failures.

$ cat $TMP/records.ndjson \
> | ${AUGUR} curate format-dates \
> --date-fields "date" "collectionDate" "releaseDate" "updateDate" \
> --expected-date-formats "%Y" "%Y-%m-%dT%H:%M:%SZ" \
> --failure-reporting "silent"
{"record": 1, "date": "2020-XX-XX", "collectionDate": "XXXX-XX-XX", "releaseDate": "XXXX-XX-XX", "updateDate": "2020-07-18"}

Test output with unmatched expected date formats while silencing failures with `--no-mask-failure`.
This is expected to return the date strings in their original format.

$ cat $TMP/records.ndjson \
> | ${AUGUR} curate format-dates \
> --date-fields "date" "collectionDate" "releaseDate" "updateDate" \
> --expected-date-formats "%Y" "%Y-%m-%dT%H:%M:%SZ" \
> --failure-reporting "silent" \
> --no-mask-failure
{"record": 1, "date": "2020-XX-XX", "collectionDate": "2020-01", "releaseDate": "2020-01", "updateDate": "2020-07-18"}

Test output with multiple matching expected date formats.
Expand Down

0 comments on commit 4d02220

Please sign in to comment.