Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

14 pft case has difficulty maintaining demotion/promotion precision #494

Closed
rgknox opened this issue Feb 27, 2019 · 7 comments
Closed

14 pft case has difficulty maintaining demotion/promotion precision #494

rgknox opened this issue Feb 27, 2019 · 7 comments
Assignees

Comments

@rgknox
Copy link
Contributor

rgknox commented Feb 27, 2019

In long term global tests on cheyenne, I am able to generate a crash with fates using the 14 default pfts. The crash occurse during canopy promotion/demotion. During promotion/demotion, the algorithm will attempt to split and fuse cohorts so that their canopy areas perfectly match the amount of patch space they have to occupy. We have a very strict precision requirement, requiring that the final crown area of non-understory layers match the patch area within a relative difference of 1e-9.

My best guess is that this crash is occurring due to the new height based ordering scheme. I've been running 14 pft global long-term tests off and on for a while, and these changes seem most likely to be the trigger.

I don't think there is anything wrong with the algorithm, my sense is that we are just challenging the demotion/promotion scheme to hit a difficult precision, with so many cohorts being split and fused simultaneously, each with their own numerical precision losses.

My original intent in setting the precision so fine, was to prevent errors that would occurr when we had incredibly small patches that were on the order of the area precision (ie much smaller than 1), which would cause problems in the radiation algorithm.

@ckoven
Copy link
Contributor

ckoven commented Feb 27, 2019

Is it possible this is a resurfacing of the issue that led to #447? I.e. that since we don't conserve crown area during fusion it needs to iterate to converge on a solution?

@rgknox
Copy link
Contributor Author

rgknox commented Feb 27, 2019

hmmm, good point @ckoven .

@rgknox
Copy link
Contributor Author

rgknox commented Feb 27, 2019

I'm going to run a test that merges in your changes and see how it does when conserving area.
We should see where we land as a group regarding our interest in supporting both conservation types, and if it turns out that the non-crown area fusion conservation is dominating this precision loss (which I think it is), lower the precision requirements and bump up the number of allowed iterations.

@rgknox
Copy link
Contributor Author

rgknox commented Feb 28, 2019

@ckoven , I've been running tests with a branch that merges in your changes. Although, I am still generating errors. I've been tracking the errors, and I believe the problems have something to do with the algorithm we use for calculating the fraction of a cohort that is demoted/promoted, particularly in the case where the probobalistic weighting scheme wants to demote more area than there is available in the cohort.

We have provisions to make these corrections, but it involves an iterative search process which is inefficient and also may be compounding precision errors. I've been thinking of ways to make this calculation cleaner and wrote them down in a document here:

https://drive.google.com/file/d/1cyylwhtBzoRSPGIwS_Hr4Ma0vgwisLWz/view?usp=sharing

I will implement this idea and see if the errors associated with promoting and demoting decrease.

@rgknox rgknox mentioned this issue Mar 1, 2019
5 tasks
@rgknox
Copy link
Contributor Author

rgknox commented Mar 4, 2019

One bug that I have found, is related to litter generation and termination of cohorts that get pushed out of the final canopy layer during demotion.

See here, https://github.com/NGEET/fates/blob/master/biogeochem/EDCanopyStructureMod.F90#L637

The assumption has been at that line of the code, that each cohort in the last layer has been split into two. The copy stays in the current layer, and the "currentCohort" is demoted. So if you are in the last canopy layer, the demoted "side" of the cohort is terminated and sent to the litter pool. However, if the cohort is flagged to have negligible demoted area, then it isnt even split. So we have been sending cohorts which shouldn't be demoted at all, to the litter pool. This has avoided mass balance errors to this point, but seems to be generated errors (rightfully) either because we have more canopy layers, or just because we are stressing the system.

Currently working on a patch.

@rosiealice
Copy link
Contributor

yeah, looks like that whole thing should check it cc%canopy_layer>nclmax) before all the litter production...

@rgknox
Copy link
Contributor Author

rgknox commented Apr 9, 2019

closing via #501

@rgknox rgknox closed this as completed Apr 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants