Report missing variables in cprnc #2988

billsacks · 2019-01-24T18:56:17Z

This PR changes cprnc and the test system so that, if variables are
missing from one of the files (the baseline file or the new file), this
will be reported as a baseline failure, like this:

FAIL SMS_D_Ld3.f10_f10_musgs.I1850Clm50BgcCrop.cheyenne_intel.clm-default BASELINE ctsm1.0.dev025: FIELDLIST field lists differ

For cprnc itself, this changes the final status of cprnc so that, if
variables are missing from one of the files, the final status is not
"IDENTICAL": this final status will now look like:

diff_test: the two files DIFFER only in their field lists

In addition, the number of variables present in file2 but not file1 is
now reported in the final counts.

If there are both missing variables and differences in common variables,
cprnc's final result is the same as before - reporting a DIFF in this
case. i.e., this new result is only reported as the final status if all
common variables are identical: a real DIFF takes precedence over a
field list difference.

This PR also changes the python scripts so that they respond
appropriately to this new possible cprnc result: During baseline
comparisons, if the only differences are in field lists (no differences
in the actual values of variables), then the baseline phase will still
report a FAIL, but a comment will be added with a keyword, FIELDLIST and
a bit of description (similarly to what we do for BFAILs), facilitating
filtering out these FIELDLIST differences if they were expected. As with
the behavior of cprnc itself, FIELDLIST is only reported if the only
differences are in field lists: if there are any real differences, then
the output is the same as before (so that you won't miss any real
differences if you do something like grep -v FIELDLIST). (If there are
both FIELDLIST and BFAIL results for a given test, but no actual
differences in values, then we report MULTIPLE, so that neither grep -v FIELDLIST nor grep -v BFAIL would accidentally exclude the result.)

Test suite:

scripts_regression_tests on cheyenne with both an old and new build
of cprnc (ran on 63377a8)
manual tests of a test with baseline comparison with fieldlist diffs,
and a test with both fieldlist and real diffs: checked TestStatus.log
and TestStatus in these cases
tests in tools/cprnc (via run_tests in that directory, with baseline
comparisons against master version): ensured that the only diffs are
as expected, and examined output of the new tests there (side-note
about the new attribute-comparison test: attribute comparisons are
currently asymmetrical: cprnc currently doesn't detect attributes
present on file2 not on file1)

Test baseline: For cprnc's run_tests: 5fa1832
Test namelist changes: none
Test status: bit for bit

Fixes #870

User interface changes?: none

Update gh-pages html (Y/N)?: N

Code review:

Test suite: None Test baseline: N/A Test namelist changes: N/A Test status: N/A Fixes: None User interface changes?: No Code review: None

Test suite: cprnc run_tests Test baseline: N/A Test namelist changes: N/A Test status: N/A Fixes: None User interface changes?: No Code review: None

This will make it easier to add the new feature Test suite: cprnc run tests Test baseline: previous commit Test namelist changes: N/A Test status: all identical Fixes: None User interface changes?: No Code review: None

Test suite: cprnc run_tests Test baseline: previous commit Test namelist changes: N/A Test status: identical except for the missing_variable test Fixes: None User interface changes?: No Code review: None

This method will be easier to extend to variables not found on file1 Test suite: cprnc run_tests Test baseline: previous commit Test namelist changes: N/A Test status: all identical Fixes: None User interface changes?: No Code review: None

For now, this outputs 0 Test suite: cprnc run_tests Test baseline: previous commit Test namelist changes: N/A Test status: identical except for new line Fixes: None User interface changes?: No Code review: None

Test suite: cprnc run_tests Test baseline: previous commit Test namelist changes: N/A Test status: identical except for expected diffs in diffs_in_vals_and_extra_and_missing.nc.out and extra_variable.nc.out Fixes: None User interface changes?: No Code review: None

This better tests the accumulation code Test suite: cprnc run_tests Test baseline: previous commit Test namelist changes: N/A Test status: identical except for expected diffs in two tests Fixes: None User interface changes?: No Code review: None

Test suite: None Test baseline: N/A Test namelist changes: N/A Test status: N/A Fixes: None User interface changes?: No Code review: None

…list_change This brings in some additional cprnc tests These changes are unrelated to the other changes on the branch, but I am including them for convenience, because both involve increasing the test coverage of the cprnc run_tests

Resolved Conflicts: tools/cprnc/README

Make it so differences in field lists will be reported specially in TestStatus.log and TestStatus.

Fail if we can't find one of the expected strings in the cprnc output. Point is: if the cprnc Fortran code is changed to change the format of one of these strings without corresponding changes in the python, we want to fail rather than silently doing the wrong thing. Also refactor this logic a bit.

billsacks · 2019-01-24T18:56:32Z

Some additional notes for reviewers:

First: I liberally added a number of reviewers here. If you're not
interested in reviewing this, feel free to remove yourself from the
list.

This PR was originally opened nearly 3 years ago here
CESM-Development#442, but was never
reviewed there. I have now merged this up to the latest version of
master and added code to the scripts to respond appropriately to the new
cprnc behavior.

I am not personally attached to the behavior change in this PR: I'm
ambivalent about whether a change in the field list should be reported
as a difference, with my concern being that this is going to lead to
differences being reported quite frequently, when there are expected
additions or removals of diagnostic fields. People will need to get used
to filtering out these expected differences. However, when I first
implemented this, it was at the request of a number of CSEG members
(chiefly @gold2718 and @mnlevy1981): the general agreement in CSEG at
the time was that this change was desirable. If people now feel
differently, I'm fine with this PR being closed (though I'll probably
try to salvage some of the mostly-unrelated pieces, like the added cprnc
tests).

If we do go with this behavior change, I'm not tied to the details of
how these differences are reported, and am open to suggestions here.

I'm not sure if we should have a new fake test type that triggers this
behavior, similar to TESTRUNDIFF? I'm not sure exactly how to do this,
but I think we could add a test to scripts_regression_tests vaguely
similar to test_bless_test_results (but probably only doing a subset
of what's done there). However, even if this is worth doing, it can't be
done until all of the cime test machines have updated versions of
cprnc. (If we wanted to go crazy, we could also test various
combinations of results: e.g., DIFF in one file and FIELDLIST in
another, and mixing BFAILs into this too... but I feel that might be
more trouble than it's worth, given that those combos are already tested
via doctests of get_ts_synopsis.)

This should be backwards compatible with old versions of cprnc:

If using new scripts with old cprnc build: the new string will never
appear in cprnc output; fieldlist diffs will be treated as a pass, as
they have been until now
Conversely, if using old scripts with new cprnc build: If there are
fieldlist diffs, this will be reported the same as a true diff: will
trigger a FAILed BASELINE result, and won't have the FIELDLIST
comment; this seems acceptable to me

jgfouca

Looks great, thanks for the good documentation and testing!

ekluzek · 2019-01-24T19:08:46Z

This would often be a useful feature. Most often you'll know about these changes, so can go through and figure it out.

jedwards4b · 2019-01-24T19:08:59Z

@billsacks I've removed myself as assignee since I am headed out on travel today.

billsacks · 2019-01-30T21:27:41Z

Adding @rljacob as a reviewer. He's going to run this by the E3SM integrators to see if they object to this change. (The general feeling on today's cime telecon is that this is a good idea.)

mnlevy1981 · 2019-02-08T16:12:50Z

I'm happy to hear these changes are going in, and I like everything mentioned in the first comment about the PR, but I don't have time to go through the code and offer any specific thoughts so I've removed myself from the reviewer list. Thanks from incorporating these changes, though!

rljacob

E3SM approves.

@jedwards4b

cprnc: allow differences in field lists for time-constant fields In cprnc: For files with an unlimited (time) dimension: Separately count (1) missing time-varying fields and (2) missing time-constant fields. Only (1) is considered in determining whether to report a final difference in the field lists. Before this, no distinction was made between time-varying vs. time-constant fields in counting the number of missing variables. (These counts were added in #2988). However, that led to failures in some exact restart tests, because some time-constant fields were on output files from one case but not the other (see #3007). Here is sample output from cprnc for a few cases: (1) Difference in the presence / absence of time-varying variables: SUMMARY of cprnc: A total number of 6 fields were compared of which 0 had non-zero differences and 0 had differences in fill patterns and 0 had different dimension sizes A total number of 0 fields could not be analyzed A total number of 2 time-varying fields on file 1 were not found on file 2. A total number of 0 time-constant fields on file 1 were not found on file 2. A total number of 0 time-varying fields on file 2 were not found on file 1. A total number of 0 time-constant fields on file 2 were not found on file 1. diff_test: the two files DIFFER only in their field lists (2) Difference in the presence / absence of time-constant variables, for files that have a time dimension: SUMMARY of cprnc: A total number of 13 fields were compared of which 0 had non-zero differences and 0 had differences in fill patterns and 0 had different dimension sizes A total number of 0 fields could not be analyzed A total number of 0 time-varying fields on file 1 were not found on file 2. A total number of 2 time-constant fields on file 1 were not found on file 2. A total number of 0 time-varying fields on file 2 were not found on file 1. A total number of 1 time-constant fields on file 2 were not found on file 1. diff_test: the two files seem to be IDENTICAL (But note that there were differences in field lists just for time-constant fields.) (3) Difference in the presence / absence of time-constant variables, for files that do NOT have a time dimension (note that this still results in a DIFFER result; it seemed to me that that's what a user would want in this case): SUMMARY of cprnc: A total number of 5 fields were compared of which 0 had non-zero differences and 0 had differences in fill patterns and 0 had different dimension sizes A total number of 0 fields could not be analyzed A total number of 2 fields on file 1 were not found on file 2. A total number of 1 fields on file 2 were not found on file 1. diff_test: the two files DIFFER only in their field lists The downside of this solution is that we wouldn't catch removals of time-constant fields when doing baseline comparisons, and might overlook this for interactive uses of cprnc. To address the issue of baseline comparisons, I thought about adding a flag that treats time-constant fieldlist diffs as differences (#3007 (comment)), but I have not done this here because this would have taken more time to implement and added significantly more complexity (in both the cprnc Fortran and in hist_utils.py) for uncertain benefit. (See the discussion in #3007 for more thoughts on this.) Once this PR is approved and merged, I will update the cprnc builds on hobart and cheyenne (currently they are using builds prior to #2988). Test suite: Ran the two tests noted as failing in #3007, pointing to a version of cprnc built from this branch; these tests now pass. scripts_regression_tests on cheyenne Test baseline: n/a Test namelist changes: none Test status: bit for bit Fixes #3007 User interface changes?: Changes cprnc behavior slightly, as noted above Update gh-pages html (Y/N)?: N Code review: @jedwards4b @jgfouca @fischer-ncar Also cc @rljacob @gold2718 @mnlevy1981 @ekluzek

Update Spectrum MPI module for PGI, XL and GNU. Update XL version which has compatible parallel-netcdf install. Update parallel-netcdf module. More updates: Add basic CXX settings to enable PIO2 builds. Fixes #2988 Backup config for home dir based builds in comments. ** Add options for HOME directory based builds in case performance issues are encountered with network file system. This is not enabled by default as space may not be sufficient Upgrade PGI to 19.4 ** Fixes #2980 slow compilation of CAM with PGI on Summit.

billsacks added 15 commits December 23, 2015 10:26

Add a test of attribute differences

b69dd07

Test suite: None Test baseline: N/A Test namelist changes: N/A Test status: N/A Fixes: None User interface changes?: No Code review: None

Add a cprnc test of character variables

68f0b94

Test suite: None Test baseline: N/A Test namelist changes: N/A Test status: N/A Fixes: None User interface changes?: No Code review: None

Add 3 new cprnc tests

703ced7

Test suite: cprnc run_tests Test baseline: N/A Test namelist changes: N/A Test status: N/A Fixes: None User interface changes?: No Code review: None

Refactor logic for printing IDENTICAL or DIFFERENT result

a7958f3

This will make it easier to add the new feature Test suite: cprnc run tests Test baseline: previous commit Test namelist changes: N/A Test status: all identical Fixes: None User interface changes?: No Code review: None

Final status of DIFFER if fields missing from file2

3370cbc

Test suite: cprnc run_tests Test baseline: previous commit Test namelist changes: N/A Test status: identical except for the missing_variable test Fixes: None User interface changes?: No Code review: None

Add output line: number of fields not found on file1

cb8f0a0

For now, this outputs 0 Test suite: cprnc run_tests Test baseline: previous commit Test namelist changes: N/A Test status: identical except for new line Fixes: None User interface changes?: No Code review: None

Update README

5584718

Test suite: None Test baseline: N/A Test namelist changes: N/A Test status: N/A Fixes: None User interface changes?: No Code review: None

Merge branch 'master' into cprnc_report_fieldlist_change

c188bc0

Resolved Conflicts: tools/cprnc/README

Handle cprnc field list differences in scripts

91f0d3b

Make it so differences in field lists will be reported specially in TestStatus.log and TestStatus.

Make formatting consistent

63377a8

billsacks added ty: enhancement tp: system tests labels Jan 24, 2019

billsacks assigned jedwards4b Jan 24, 2019

billsacks requested review from gold2718, mnlevy1981, jedwards4b, jgfouca and fischer-ncar January 24, 2019 18:56

jgfouca approved these changes Jan 24, 2019

View reviewed changes

jedwards4b approved these changes Jan 24, 2019

View reviewed changes

jedwards4b assigned billsacks and unassigned jedwards4b Jan 24, 2019

fischer-ncar approved these changes Jan 24, 2019

View reviewed changes

billsacks requested a review from rljacob January 30, 2019 21:27

mnlevy1981 removed their request for review February 8, 2019 16:11

rljacob approved these changes Feb 8, 2019

View reviewed changes

jedwards4b merged commit 51ab1c1 into ESMCI:master Feb 11, 2019

billsacks deleted the cprnc_report_fieldlist_change branch February 12, 2019 15:48

fischer-ncar mentioned this pull request Feb 20, 2019

ERS and ERP tests failing with updated cprnc #3007

Closed

klindsay28 mentioned this pull request Feb 21, 2019

cs.status/cimeteststatus no longer reporting reporting which components fail baseline comparison tests #884

Closed

billsacks mentioned this pull request Mar 22, 2019

cprnc: allow differences in field lists for time-constant fields #3051

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report missing variables in cprnc #2988

Report missing variables in cprnc #2988

billsacks commented Jan 24, 2019 •

edited

Loading

billsacks commented Jan 24, 2019

jgfouca left a comment

ekluzek commented Jan 24, 2019

jedwards4b commented Jan 24, 2019

billsacks commented Jan 30, 2019

mnlevy1981 commented Feb 8, 2019

rljacob left a comment

Report missing variables in cprnc #2988

Report missing variables in cprnc #2988

Conversation

billsacks commented Jan 24, 2019 • edited Loading

billsacks commented Jan 24, 2019

jgfouca left a comment

Choose a reason for hiding this comment

ekluzek commented Jan 24, 2019

jedwards4b commented Jan 24, 2019

billsacks commented Jan 30, 2019

mnlevy1981 commented Feb 8, 2019

rljacob left a comment

Choose a reason for hiding this comment

billsacks commented Jan 24, 2019 •

edited

Loading