Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistency in c12 maintenance respiration calculation #3257

Merged
merged 1 commit into from
Mar 17, 2020

Conversation

jinyuntang
Copy link
Contributor

@jinyuntang jinyuntang commented Oct 24, 2019

For ELM, when use_pheno_flux_limiter=.true., the components
of maintance respiration are updated to avoid negative carbon
state variables. However, the total maintenance respiration
is not updated accordingly, causing an overesimation of total
maintenance respiration mr. Now this bug is fixed by ensuring
maintenance respiration is always recalculated by summing up
its components.

[non-BFB] due to very tiny value changes.

Fixes #3256

@jinyuntang
Copy link
Contributor Author

@bishtgautam, since for all the tests that are tracked, use_pheno_flux_limiter=.false. and the fix is only by removing one if else structure, I did not do the whole package of tests.

@@ -8070,12 +8070,12 @@ subroutine veg_cf_summary(this, bounds, num_soilp, filter_soilp, num_soilc, filt
this%psnshade_to_cpool(p)

! maintenance respiration (MR)
if ( trim(isotope) == 'c13' .or. trim(isotope) == 'c14') then
!if ( trim(isotope) == 'c13' .or. trim(isotope) == 'c14') then
Copy link
Contributor

@bishtgautam bishtgautam Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete the line instead of leaving it as a comment and reindent the following lines. thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as you suggested. Thanks

@rljacob
Copy link
Member

rljacob commented Oct 25, 2019

@jinyuntang The description needs a couple of fixes. Put brackets [] around BFB. The reference to the issue should just say "Fixes #3256" to autoclose the issue. Add blank lines around the "Fixes" line.

@jinyuntang
Copy link
Contributor Author

@rljacob. Changes were made per your suggestion. Thanks

bishtgautam added a commit that referenced this pull request Oct 30, 2019
For ELM, when use_pheno_flux_limiter=.true., the components
of maintance respiration are updated to avoid negative carbon
state variables. However, the total maintenance respiration
is not updated accordingly, causing an overesimation of total
maintenance respiration mr. Now this bug is fixed by ensuring
maintenance respiration is always recalculated by summing up
its components.

[BFB] when use_pheno_flux_limiter=.false.

Fixes #3256
@bishtgautam bishtgautam added the BFB PR leaves answers BFB label Oct 30, 2019
@bishtgautam
Copy link
Contributor

@jinyuntang Is this PR expected to cause differences in for BGCEXP_BCRC_CNPRDCTC_1850 compset? The SMS.ne30_oECv3.BGCEXP_BCRC_CNPRDCTC_1850 test is showing some DIFFs on sandiatoss3. These differences are in fco2:

 x2a_Fall_fco2_lnd   (x2a_nx,x2a_ny,time)  t_index =      1     1
          1    48602  ( 12351,     1,     1) ( 27635,     1,     1) ( 31007,     1,     1) ( 31007,     1,     1)
               48602   3.914762924921206E-09  -3.934912527075549E-10 4.0E-28 -1.186405063692956E-12 7.0E-21 -1.186405063692956E-12
               48602   3.914762924921206E-09  -3.934912527075549E-10         -1.186405063692956E-12         -1.186405063692956E-12
               48602  ( 12351,     1,     1) ( 27635,     1,     1)
          avg abs field values:    9.274471606793769E-12    rms diff: 1.8E-30   avg rel diff(npos):  7.0E-21
                                   9.274471606793769E-12                        avg decimal digits(ndif): 15.5 worst: 15.5
 RMS x2a_Fall_fco2_lnd                1.8321E-30            NORMALIZED  1.9754E-19

 l2x_Fall_fco2_lnd   (l2x_nx,l2x_ny,time)  t_index =      1     1
          1    48602  ( 12351,     1,     1) ( 27635,     1,     1) ( 31007,     1,     1) ( 31007,     1,     1)
               16368   3.914762924921206E-09  -3.934912527075549E-10 3.2E-27 -1.429947752075437E-11 1.4E-20 -1.429947752075437E-11
               16368   3.914762924921206E-09  -3.934912527075549E-10         -1.429947752075437E-11         -1.429947752075437E-11
               48602  ( 12351,     1,     1) ( 27635,     1,     1)
          avg abs field values:    4.891403430389996E-11    rms diff: 2.5E-29   avg rel diff(npos):  1.4E-20
                                   4.891403430389996E-11                        avg decimal digits(ndif): 15.6 worst: 15.6
 RMS l2x_Fall_fco2_lnd                2.5256E-29            NORMALIZED  5.1633E-19

Looking at the lnd_import_export.F90,

       if (index_l2x_Fall_fco2_lnd /= 0) then
          l2x(index_l2x_Fall_fco2_lnd,i) = -lnd2atm_vars%nee_grc(g)  
       end if

Will your changes modify NEE?

@jinyuntang
Copy link
Contributor Author

@bishtgautam, after thinking this again, I think it will potentially affect nee, and mr. Basically, what it does is the following, elm separates mr into two parts
mr2 = mr-m1
However, in certain cases, mr2 will be too large causing numerical trouble, and use_pheno_flux_limiter=.true. meant to fix this.

In current practice, such numerical trouble is not tracked, but in my fix I recalculated
mr=mr1+m2
at the end. So difference in memory (or the limited precision provided by the hardware) will cause very small differences.

Nonetheless, this fix will ensure consistency between the bulk carbon and carbon isotopes when C13 and/or C14 is turned on. Perhaps, @kvcalvin and @bpbond can give your opinion on this?

@bishtgautam
Copy link
Contributor

Can you check if your PR is non-bfb for SMS.ne30_oECv3.BGCEXP_BCRC_CNPRDCTC_1850?

@jinyuntang
Copy link
Contributor Author

@bishtgautam, will do.

@rljacob
Copy link
Member

rljacob commented Nov 1, 2019

What is the status on this? We have a diff in our testing that we need to know what to do with.

@bishtgautam
Copy link
Contributor

I'm going to revert this PR from next.

bishtgautam added a commit that referenced this pull request Nov 1, 2019
@jinyuntang
Copy link
Contributor Author

@rljacob, I have pinged @bpbond and @kvcalvin for their input, but haven't heard anything yet. It's bug needs to be fixed though.

@bpbond
Copy link

bpbond commented Nov 5, 2019

@jinyuntang (or anyone else), what was the original point of only re-calculating MR for 13C or 14C? Is the use_pheno_flux_limiter=.true. logic not normally used? I'm confused why this problem wasn't exposed before now. Thanks.

@jinyuntang
Copy link
Contributor Author

@bpbond, use_pheno_flux_limiter=.true. was used to correct the negative state variable issue we discussed deliberately early this year. When use_pheno_flux_limiter=.false. by default, the inconsistency was not detected, and before our BGC experiment, nor the SMS.ne30_oECv3.BGCEXP_BCRC_CNPRDCTC_1850 test existed to reveal the difference. Further, I don't think the carbon isotopes are in any of the test either. Nonetheless, I think it is an import thing to fix.

@rljacob
Copy link
Member

rljacob commented Dec 5, 2019

@bpbond any further input on this PR?

@rljacob
Copy link
Member

rljacob commented Jan 27, 2020

@bishtgautam what is the hold-up on this PR? Everyone has approved it.

@bishtgautam
Copy link
Contributor

@jinyuntang Should this PR be marked non-BFB based on my earlier comment?

@jinyuntang
Copy link
Contributor Author

@bishtgautam, it is potentially non-BFB when there are tiny discrepancies occur. Such as b=a-c, but a is necessarily recalculated as b+c and used in mass balance.

@thorntonpe
Copy link
Contributor

There has not been a demonstration of a problem that needed to be fixed in the CTC implementation, as far as I recall. The only problem appeared when using ECA. So I am not comfortable with this being non-BFB for the CTC case.

@thorntonpe
Copy link
Contributor

I think there needs to be a quantification of the non-BFB behavior for CTC, at least.

@jinyuntang
Copy link
Contributor Author

@thorntonpe, in this case, I can screen it off for CTC, although I do suspect when people are using isotopes with CTC, such problems are buried in with unknown consequences.

@jinyuntang
Copy link
Contributor Author

@thorntonpe, maybe I should reach out to Xiaojuan to conduct a robust assessment with CTC?
Thanks.

@thorntonpe
Copy link
Contributor

@jinyuntang I'm confused about the non-BFB behavior that @bishtgautam flags for the CTC test. Based on your earlier comments, all of the current tests have use_pheno_flux_limiter = false, in which case no tests should be failing as non-BFB. The fact that one of the CTC tests fails is a concern, since your changes should not be impacting any of that code in the default test configuration. I also am not comfortable with including the flux limiter in CTC runs even when it is turned on - it has only ever been shown to be a problem for ECA. We did some extensive checking on this in CTC when your first brought the issue to our attention, and never saw a problem. So I would prefer that, for now, any changes you are making be limited to the ECA configuration.
Given all that, I don't think it is necessary to conduct the CTC assessment with Xiaojuan that you mention above. The first step should be to address the question from @bishtgautam , which is: why does this PR cause non-BFB behavior in the CTC case? The code changes are not supposed to be active in any of the standard tests, so there should be no BFB failures.
The second step should be, if you want to bring this into ECA, then put the isolation in place to prevent it having an impact on CTC even when the use_pheno_flux _limiter is turned on.

@jinyuntang
Copy link
Contributor Author

@thorntonpe, thanks for describing you concern in detail. I will try to clarify it below.
First, I think it is quite common to see the difference due to the limited precision the hardware can support. For instance, if you compare the value of a in the following lines
c=a-b
a=b+c
They are different with many compilers on many machines.

Second, this situation only caused very small difference, as @bishtgautam found. In fact, their values are really small, e.g.
rms diff: 2.5E-29 avg rel diff(npos): 1.4E-20
avg decimal digits(ndif): 15.6 worst: 15.6
which are just as expected.

Second, I would like to confirm that use_pheno_flux_limiter=.true. is not used for CTC, nor by default for ECA. It is introduced only to avoid the negative value failure we discussed intensively last year.

Now the question is are the CTC team comfortable with the tiny difference reported in the failure? Or I have to do a long simulation test to get a better idea about the implication of this difference?

Thanks.

@thorntonpe
Copy link
Contributor

@jinyuntang I'm sorry if I am slow to understand the details of this PR. At a high level, I'm going on the title of the PR to understand its intended behavior. The title says that this PR "fixes a C mass balance error when use_pheno_flux_limiter=.true." If the default behavior for any CTC case is use_pheno_flux_limiter=.false., then I still don't understand why the PR has a non-BFB behavior, even a small one, for any CTC case. It seems like the PR is affecting code behavior that it should not be touching.

@jinyuntang
Copy link
Contributor Author

@thorntonpe no worry. But I think there is still some confusion existing for the case there. And I'd like to explain it again.
For floating variable calculations, small differences are expected for the following situation. For instance, assuming
a=4.0
b=3.0
c=a-b !so c=1.0 ideally, but could be 0.99999999999999999
a2=c+b !this will very likely produce a number that diffs slightly from 4.0
then a2 == a will be false due to the very small difference caused by the limited precision the hardware can provide.

Now in the ELM code that does the calculation of maintenance respiration, for c12, a total respiration is first computed, then it is separated into different comments, when applied to c13 or c14, it however, the total maintenance respiration is not calculated first, rather it is summarized after all components are derived from c12 component calculation. To ensure the consistency, the c12 maintenance respiration should be recalculated as the sum of its components. But it is not done by default. The option use_pheno_flux_limiter just accidentally discovered this hidden inconsistency, even though it did not mean to touch any CTC or default ECA code. Anyhow, if the CTC team is worried about the small differences, e.g. rms diff: 1.8E-30 avg rel diff(npos): 7.0E-21, I would be happy to conduct longer simulations.

BTW, in the testing of some other codes that I am aware of, we often do not test BFB, rather we test the model output to a certain prescribed accuracy, e.g. 1.e-15, so that any values smaller than 1.e-15 is considered harmless. I think the NonBFB failure here is just exaggerating the alarm.
@rljacob do you have any thoughts on this? Thanks

@rljacob
Copy link
Member

rljacob commented Jan 28, 2020

You need to change the title of this PR. It should be something like "Fix inconsistency in c12 maintenance respiration calculation." The value of pheno_flux_limiter doesn't matter then.

We always check BFB in our baseline testing so that unexpected diffs are caught. Allowing an expected non-BFB change is ultimate up to the group leads.

@jinyuntang jinyuntang changed the title Fixed C mass blanace error when use_pheno_flux_limiter=.true. Fix inconsistency in c12 maintenance respiration calculation Jan 28, 2020
@jinyuntang
Copy link
Contributor Author

@rljacob just did it.

@jinyuntang jinyuntang removed the BFB PR leaves answers BFB label Jan 29, 2020
@thorntonpe
Copy link
Contributor

@jinyuntang Thanks for the further details. I do understand the numerical issues leading to non-BFB with a change in order of operations. I did not understand that this change was intended to operate outside of the use_pheno_flux_limiter = .true. case. So BFB is not necessarily expected. Changing the title of the PR and removing the BFB label are good first steps, as done. I will look at the code changes with this in mind.

@jinyuntang
Copy link
Contributor Author

@thorntonpe Sounds great! I think I was too optimistic when I made the change, but then the details showed its devil face during the broader test than the one I did.

@rljacob rljacob added the non-BFB PR makes roundoff changes to answers. label Feb 20, 2020
For ELM, when use_pheno_flux_limiter=.true., the components
of maintance respiration are updated to avoid negative carbon
state variables. However, the total maintenance respiration
is not updated accordingly, causing an overesimation of total
maintenance respiration mr. Now this bug is fixed by ensuring
maintenance respiration is always recalculated by summing up
its components.

removed some obsolte comment in ELM

Some comment lines are removed to avoid future confusion.
@bishtgautam bishtgautam force-pushed the jinyuntang/lnd/c_inbal_fix branch from 526288b to ee495a5 Compare February 20, 2020 18:42
bishtgautam added a commit that referenced this pull request Feb 20, 2020
For ELM, when use_pheno_flux_limiter=.true., the components
of maintance respiration are updated to avoid negative carbon
state variables. However, the total maintenance respiration
is not updated accordingly, causing an overesimation of total
maintenance respiration mr. Now this bug is fixed by ensuring
maintenance respiration is always recalculated by summing up
its components.

[non-BFB] due to very tiny value changes.

Fixes #3256
@bishtgautam bishtgautam merged commit 2649126 into master Mar 17, 2020
rljacob added a commit that referenced this pull request Mar 17, 2020
rljacob added a commit that referenced this pull request Mar 17, 2020
Fixes the incorrect merge of #3257 in 2649126

Fixes #3499

[BFB]
@bishtgautam bishtgautam deleted the jinyuntang/lnd/c_inbal_fix branch October 2, 2020 17:43
rljacob pushed a commit that referenced this pull request Apr 21, 2021
In #3257, the namelist_definition_drv.xml also needed to be updated. Change sst_aquap11 to sst_aquap_constant.

Test suite:
Test baseline:
Test namelist changes: Tested using QPRCEMIP compset
Test status: [bit for bit, roundoff, climate changing]

Fixes [CIME Github issue #] none (issue was opened in ESCOMP/CAM as issue 39, but discovered the error was in cime)

User interface changes?:

Update gh-pages html (Y/N)?:

Code review:
rljacob pushed a commit that referenced this pull request May 6, 2021
Change globally uniform SST mode from AQP11 to AQPCONST
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix PR Land non-BFB PR makes roundoff changes to answers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ELM Carbon mass balance error when use_pheno_flux_limiter=.true.
5 participants