-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FATES_ERRH2O_SCPF fails COMPARE_base_rest #701
Comments
I'm wondering if maybe it is not reasonable for some error checks, particularly the small ones where the absolute values are close to machine precision, and thus the relative differences can be large. |
@rgknox I'm not sure if I understand your thinking, but it shouldn't matter if the absolute value is close to machine precision. Machine precision of ~ 1e-16 means that values of 1 and (1+1e-16) are indistinguishable. But double precision values can represent values down to about 1e-308. If you have a value like 1e-17, then that's indistinguishable from (1e-17 + 1e-33), but it should be very distinguishable from, say, 1.1e-17. |
Yes, this looks like a legit issue to me. We do expect exact results between the baseline and restart cases for all history fields. And actually the error is large compared to the average values of the data. Does FATES_ERRH2O_SCPF need to be added to the restart file? |
Thanks for your responses! I think my terminology was bad in my previous comment. I don't mean machine precision, I mean floating point math error. In this case it is the normalized RMS that is tripping the fail right? My worry is that if we can expect math errors of ~e-17 or greater (which seems reasonable, but maybe I'm wrong), than the normalized difference when this amount of error is compared to such a small number... is going to be large. For instance, the normalized RMS when we have a difference between two numbers (due to math errors) of e-17, when their mean is e-17, is 1. There is also something else I'm wondering: Let assume that the "state" of the model, ie the values of all variables that have memory, is exactly the same in one simulation at the beginning of the restart initialization, as the original (ie to the precision of the floating point values that it holds). If the same sequence of math operations are then performed on these numbers is exactly the same in both the restart and the base, should we expect the exact same rounding errors in both cases? ie, should both base and restart have the same error bias and match? And thus a more identical result than one where we expect math error? |
@rgknox okay, I think I understand what you're getting at here, but may still be misunderstanding. I think you're saying that this ERR field is the small difference between two relatively larger fields. So we have something like ERR = (A - B). Let's say A and B are both order 1, and the difference between the two is itself roundoff-level, so ERR is order 1e-16. If you then introduce a roundoff-level change in A, then ERR will change by order 1e-16, so normalized differences in ERR will be order 1. If that's what you're saying, then I completely agree. Indeed, we often see big relative differences in various ERR terms when we make roundoff-level changes in the model. However, that all applies to comparisons between a case and a baseline. For restart comparisons, @ekluzek 's point holds: we expect the restart to be bit-for-bit in all fields. As for your question:
As far as I know, the answer is yes, we should expect the exact same rounding errors in both cases. I believe this is specified by the IEEE floating point standard. It's possible that some compiler flags would cause this not to be the case (and not follow the IEEE standard) for performance reasons, but we typically avoid those compiler flags, if they exist. So I wonder if there may be a slightly different code path that's followed in the base vs. restart case here that causes the roundoff behavior to differ. |
Note that this issue appears to have been fixed for the original test on cheyenne with the intel compiler, but is now showing up for the same test on izumi with the nag compiler: #854 (comment) |
Reopening due to the reappearance of this issue with the testing for ESCOMP/CTSM#1849. |
This error was discovered after correcting another error in the
FatesHydro
test mod in which no history files where being generated for theERS_D_Ld5.1x1_brazil.I2000Clm50FatesCruGs.cheyenne_intel.clm-FatesHydro
test case (see ESCOMP/CTSM#1160). The test case runs, but fails the restart onFATES_ERRH2O_SCPF
only:The text was updated successfully, but these errors were encountered: