14 pft case has difficulty maintaining demotion/promotion precision #494

rgknox · 2019-02-27T18:32:38Z

In long term global tests on cheyenne, I am able to generate a crash with fates using the 14 default pfts. The crash occurse during canopy promotion/demotion. During promotion/demotion, the algorithm will attempt to split and fuse cohorts so that their canopy areas perfectly match the amount of patch space they have to occupy. We have a very strict precision requirement, requiring that the final crown area of non-understory layers match the patch area within a relative difference of 1e-9.

My best guess is that this crash is occurring due to the new height based ordering scheme. I've been running 14 pft global long-term tests off and on for a while, and these changes seem most likely to be the trigger.

I don't think there is anything wrong with the algorithm, my sense is that we are just challenging the demotion/promotion scheme to hit a difficult precision, with so many cohorts being split and fused simultaneously, each with their own numerical precision losses.

My original intent in setting the precision so fine, was to prevent errors that would occurr when we had incredibly small patches that were on the order of the area precision (ie much smaller than 1), which would cause problems in the radiation algorithm.

ckoven · 2019-02-27T18:34:51Z

Is it possible this is a resurfacing of the issue that led to #447? I.e. that since we don't conserve crown area during fusion it needs to iterate to converge on a solution?

rgknox · 2019-02-27T18:36:53Z

hmmm, good point @ckoven .

rgknox · 2019-02-27T18:40:54Z

I'm going to run a test that merges in your changes and see how it does when conserving area.
We should see where we land as a group regarding our interest in supporting both conservation types, and if it turns out that the non-crown area fusion conservation is dominating this precision loss (which I think it is), lower the precision requirements and bump up the number of allowed iterations.

rgknox · 2019-02-28T21:33:21Z

@ckoven , I've been running tests with a branch that merges in your changes. Although, I am still generating errors. I've been tracking the errors, and I believe the problems have something to do with the algorithm we use for calculating the fraction of a cohort that is demoted/promoted, particularly in the case where the probobalistic weighting scheme wants to demote more area than there is available in the cohort.

We have provisions to make these corrections, but it involves an iterative search process which is inefficient and also may be compounding precision errors. I've been thinking of ways to make this calculation cleaner and wrote them down in a document here:

https://drive.google.com/file/d/1cyylwhtBzoRSPGIwS_Hr4Ma0vgwisLWz/view?usp=sharing

I will implement this idea and see if the errors associated with promoting and demoting decrease.

rgknox · 2019-03-04T21:10:42Z

One bug that I have found, is related to litter generation and termination of cohorts that get pushed out of the final canopy layer during demotion.

See here, https://github.com/NGEET/fates/blob/master/biogeochem/EDCanopyStructureMod.F90#L637

The assumption has been at that line of the code, that each cohort in the last layer has been split into two. The copy stays in the current layer, and the "currentCohort" is demoted. So if you are in the last canopy layer, the demoted "side" of the cohort is terminated and sent to the litter pool. However, if the cohort is flagged to have negligible demoted area, then it isnt even split. So we have been sending cohorts which shouldn't be demoted at all, to the litter pool. This has avoided mass balance errors to this point, but seems to be generated errors (rightfully) either because we have more canopy layers, or just because we are stressing the system.

Currently working on a patch.

rosiealice · 2019-03-05T11:30:48Z

yeah, looks like that whole thing should check it cc%canopy_layer>nclmax) before all the litter production...

rgknox · 2019-04-09T23:00:53Z

closing via #501

rgknox mentioned this issue Feb 28, 2019

The leaf and stem are predicted for a cohort, maxed out the array size #495

Closed

rgknox mentioned this issue Mar 1, 2019

phenology status timers #468

Merged

5 tasks

rgknox mentioned this issue Mar 7, 2019

Summary of Model Crashes - March 2019 #500

Closed

rgknox added the bug - unknown label Mar 25, 2019

rgknox self-assigned this Mar 25, 2019

rgknox closed this as completed Apr 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

14 pft case has difficulty maintaining demotion/promotion precision #494

14 pft case has difficulty maintaining demotion/promotion precision #494

rgknox commented Feb 27, 2019

ckoven commented Feb 27, 2019

rgknox commented Feb 27, 2019

rgknox commented Feb 27, 2019

rgknox commented Feb 28, 2019

rgknox commented Mar 4, 2019

rosiealice commented Mar 5, 2019

rgknox commented Apr 9, 2019

14 pft case has difficulty maintaining demotion/promotion precision #494

14 pft case has difficulty maintaining demotion/promotion precision #494

Comments

rgknox commented Feb 27, 2019

ckoven commented Feb 27, 2019

rgknox commented Feb 27, 2019

rgknox commented Feb 27, 2019

rgknox commented Feb 28, 2019

rgknox commented Mar 4, 2019

rosiealice commented Mar 5, 2019

rgknox commented Apr 9, 2019