-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERS and ERP tests failing with updated cprnc #3007
Comments
This would suggest that the restart causes fields to be written to the history that were not written from a cold start - does this make sense? What fields are they? @billsacks ? |
Shoot, sorry that I didn't think of this, or do the active-component testing that would have uncovered it. CLM outputs a handful of "time-constant" fields to only its first history file, hence this issue. I can think of a few solutions:
I'm not sure what's best here... or maybe there are other solutions I didn't think of. Also cc @ekluzek |
How time-critical is it to get this fix in? One short-term solution would be to back up to the old builds of cprnc while we come up with a long-term solution. |
Would it be possible to ignore field list differences for time constant fields? I've already updated cprnc on a number of platforms. |
It'll only be time-critical when I need to start the next round of tests. |
I think we can note these as expected failures until we arrive at a solution. |
It's easy enough to manually check the cprnc output to verify if the tests truly failed. |
We don't have an "expected failure" mechanism. @jgfouca has this new cprnc made it back to E3SM yet? |
No, I'm pretty sure it hasn't and even if it has you also have to build and install it on the system. |
What do you think about this fix:
This would require changes in the cprnc fortran (so would require cprnc to be rebuilt) and in CLM and E3SM, but I think the changes would be fairly straightforward. I like this better than my solutions (3) or (4) because it would be easier for other components to implement this if necessary. |
I guess I don't really like the idea of modifying history files to accommodate cprnc, what about incorporating some logic regarding variables without an unlimited dim - compare them if they exist in both files, ignore them if they are only present in one? |
I agree with @jedwards4b, that seems like a more robust solution, and would only require messing with cprnc. |
I'm hesitant to introduce general logic that ignores differences in field lists for time-constant variables, because cprnc is a general-purpose tool and it could be really confusing / misleading if it avoids checking the presence/absence of time-constant variables in general. However, I could imagine introducing a flag to cprnc like '--allow-fieldlist-diffs-for-vars-with-no-time' (that's a mouthful; any suggestions for a better name?), which, if set, would do what @jedwards4b suggests. We would set that flag in cprnc calls from hist_utils.py. Does that sound reasonable to you? |
yes --allow-missing-constant-fields? |
Thanks, @jedwards4b . "constant" could mean a number of things, but maybe "--allow-missing-timeconst-fields"? |
This is also causing a failure for
due to these fields:
I think @jedwards4b 's suggested fix will fix this one, too. cc @negin513 |
I realized a problem with my suggestion of adding a flag like I may also add a flag that turns on the counting of these time-constant fields, like I need to make progress on some other time-critical things this week, so probably won't get to this until next week. |
For files with an unlimited (time) dimension, provide two separate counts: (1) The number of missing fields with an unlimited (time) dimension (2) The number of missing fields without an unlimited (time) dimension Only (1) is considered in determining whether to report a final difference in the field lists. Resolves ESMCI#3007
cprnc: allow differences in field lists for time-constant fields In cprnc: For files with an unlimited (time) dimension: Separately count (1) missing time-varying fields and (2) missing time-constant fields. Only (1) is considered in determining whether to report a final difference in the field lists. Before this, no distinction was made between time-varying vs. time-constant fields in counting the number of missing variables. (These counts were added in #2988). However, that led to failures in some exact restart tests, because some time-constant fields were on output files from one case but not the other (see #3007). Here is sample output from cprnc for a few cases: (1) Difference in the presence / absence of time-varying variables: SUMMARY of cprnc: A total number of 6 fields were compared of which 0 had non-zero differences and 0 had differences in fill patterns and 0 had different dimension sizes A total number of 0 fields could not be analyzed A total number of 2 time-varying fields on file 1 were not found on file 2. A total number of 0 time-constant fields on file 1 were not found on file 2. A total number of 0 time-varying fields on file 2 were not found on file 1. A total number of 0 time-constant fields on file 2 were not found on file 1. diff_test: the two files DIFFER only in their field lists (2) Difference in the presence / absence of time-constant variables, for files that have a time dimension: SUMMARY of cprnc: A total number of 13 fields were compared of which 0 had non-zero differences and 0 had differences in fill patterns and 0 had different dimension sizes A total number of 0 fields could not be analyzed A total number of 0 time-varying fields on file 1 were not found on file 2. A total number of 2 time-constant fields on file 1 were not found on file 2. A total number of 0 time-varying fields on file 2 were not found on file 1. A total number of 1 time-constant fields on file 2 were not found on file 1. diff_test: the two files seem to be IDENTICAL (But note that there were differences in field lists just for time-constant fields.) (3) Difference in the presence / absence of time-constant variables, for files that do NOT have a time dimension (note that this still results in a DIFFER result; it seemed to me that that's what a user would want in this case): SUMMARY of cprnc: A total number of 5 fields were compared of which 0 had non-zero differences and 0 had differences in fill patterns and 0 had different dimension sizes A total number of 0 fields could not be analyzed A total number of 2 fields on file 1 were not found on file 2. A total number of 1 fields on file 2 were not found on file 1. diff_test: the two files DIFFER only in their field lists The downside of this solution is that we wouldn't catch removals of time-constant fields when doing baseline comparisons, and might overlook this for interactive uses of cprnc. To address the issue of baseline comparisons, I thought about adding a flag that treats time-constant fieldlist diffs as differences (#3007 (comment)), but I have not done this here because this would have taken more time to implement and added significantly more complexity (in both the cprnc Fortran and in hist_utils.py) for uncertain benefit. (See the discussion in #3007 for more thoughts on this.) Once this PR is approved and merged, I will update the cprnc builds on hobart and cheyenne (currently they are using builds prior to #2988). Test suite: Ran the two tests noted as failing in #3007, pointing to a version of cprnc built from this branch; these tests now pass. scripts_regression_tests on cheyenne Test baseline: n/a Test namelist changes: none Test status: bit for bit Fixes #3007 User interface changes?: Changes cprnc behavior slightly, as noted above Update gh-pages html (Y/N)?: N Code review: @jedwards4b @jgfouca @fischer-ncar Also cc @rljacob @gold2718 @mnlevy1981 @ekluzek
For files with an unlimited (time) dimension, provide two separate counts: (1) The number of missing fields with an unlimited (time) dimension (2) The number of missing fields without an unlimited (time) dimension Only (1) is considered in determining whether to report a final difference in the field lists. Resolves ESMCI/cime#3007
The ERS and ERP tests are failing the restart compare for compsets with active clm and cam. This
happens with PR #2988.
FAIL ERP_Ln9.f09_f09_mg17.F1850.cheyenne_intel.cam-outfrq9s COMPARE_base_rest
Tail of output for clm history files.
SUMMARY of cprnc:
A total number of 238 fields were compared
of which 0 had non-zero differences
and 0 had differences in fill patterns
and 0 had different dimension sizes
A total number of 2 fields could not be analyzed
A total number of 0 fields on file 1 were not found on file 2.
A total number of 8 fields on file 2 were not found on file 1.
diff_test: the two files DIFFER only in their field lists
The text was updated successfully, but these errors were encountered: