-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: inflation files when using 'no copy back' variables #276
Comments
there are 2 issues here. one is scientific and one is implementation related. scientific: one option is to set the initial inflation to 1.0 and never change it. or set an initial inflation value (how much?) to match the rest of the state. inflation is needed when the spread of the ensemble is too small - so the answer might depend on what the field is. if it has enough spread to compute a forward operator value with enough ensemble spread then no inflation is fine. if you're using adaptive inflation, it is going to compute new inflation values but the field isn't going to be updated so it isn't clear you want to save those. implementation: the issue is that on read every field needs a value and if you're creating new inflation files (instead of updating existing files) they won't have all the fields . so the solution was to create an inflation file with values for all the input fields (using fill_inflation_restart, for example) and then copying it to where the output inflation is going to be and letting filter only update the adaptive parts. i don't see any other logical way to handle this. it doesn't seem like a bad solution. if you tried to write extra fields to only the inflation files you'd be complicating the parallel i/o code, and it's not clear you want to write new values over old values anyway. does anyone else remember issues with this that i'm forgetting? |
p.s. this affects any models that allow missing values in the state. these are rare, fortunately. |
I guess I remember the IO code dealing with inflation files differently. I think this affects any model that has NO COPY BACK, e.g. WRF. missing_r8 in the state is another issue on top. In adding this as a GitHub issue, I was thinking this would just be a note to keep in mind for future development. For example, the a hybrid with a giant ensemble might have copies that you don't want to treat the same as regular ensemble copies. Or you do actually want to look at what the inflation is doing for science reasons. Note I don't know if this is important, but at the moment we are letting the model dictate what the inflation files look like. |
maybe this is for issue #277 |
i did get them mixed up. for this one: this does affect any models that have NO_COPY_BACK as an option. |
there are slight differences higher in the calling stack, e.g. inflation copy values aren't clamped and the filenames are fixed. but by the time you get down to the code that collects the ensemble member fields onto separate tasks and calls the netcdf write code, every output file looks the same - ensemble members, ensemble mean, ensemble sd, inflation mean and inflation sd. |
Just a comment from me: If the user doesn't want DART to impact these variables during the update using a "no-copy-back" flag, then they shouldn't be inflated either. Inflation is in essence a mechanism to aid the update and so if the update won't happen then inflation shouldn't happen either. Inflation will change the value of the ensemble variable and the user doesn't want that. So, to me if a "no-copy-back" is requested for variable x then we DON'T inflate variable x. The only question remains is how to implement that efficiently in the code or the script without a lot of intrusive action. |
something we've talked about for a long time is having a different mechanism for fields which are only needed to compute a forward operator but aren't going to be changed by the assimilation. in addition to interpolating state values, there are 4 categories of non-state data that can be needed by forward operators:
|
|
there is the additional issue that just setting them NO_COPY_BACK doesn't automatically prevent those fields from getting updated by the assimilation. the user would have to take the additional step of having the model_mod get_close() routine return very large distances from an obs to those fields, or use the obs_impact tool to avoid updating those fields. if the values are updated, the diagnostic posterior obs values will be computed using updated field values which are not going to be written and saved. |
To echo what was said before, one way to approach this would be to set the initial inflation to 1.0 and the "no-copy-back" variables should never be inflated. In fact, it seems like a bad idea in general to set the initial inflation to anything but 1.0. Applying an initial inflation >1.0, gives a spatially uniform inflation value to the entire spatial domain, even if an observation is not nearby -- which seems unnecessary. I assume there are situations where setting an initial inflation to anything other than 1.0 is useful, otherwise this would not be an option in Also, to make sure I am clear about this -- is DART purposefully designed such that inflation is applied to a state variable that is just being used by the forward operator, or is that just a side effect of the code implementation? I understand that Inflation is designed/intended to be applied to the (prognostic) variables which are also updated during the filter step, but am still unclear about the forward operator variables. |
so the other way of looking at this is that the inflation files (edit: when created from scratch by DART) are the correct size, but the state vector is too big. state vector (ensemble size x variables to be inflated and updated) At the moment the non-state is lumped in with the state in DART.
|
see my answers/opinions to your (very good) questions below:
yes
probably not. CLM has no model-specific get_close routine so it's using the one in the location mod which has no knowledge of NO_COPY_BACK fields. i know one model that avoids updates in get_close and it isn't even marked as NO_COPY_BACK (but maybe should be?) is POP which has a field in the state which indicates land/ocean grid points. we explicitly look for that in get_close and set the distance to be huge. using the obs_impact tool (which is newer than the POP code) would at least skip calling update code for dry land in the state loop. it wouldn't keep the land field values out of the get_close list, but the loop would cycle sooner with obs_impact, i think. i'm not sure if you'd see a detectable performance improvement unless you could skip the broadcast. (i haven't traced the code path to see if that's possible.)
it might - but the current adaptive localization mechanism is imprecise in many ways. for openers, it only computes in 2d rather than 3d. also, it assumes observations are distributed evenly in space. see comments and code in
yes
if people set up the inflation files so non-state data always has 1.0 as the inflation factor then no. even if the inflation code computes an updated value it isn't going to be used until the following assimilation step. if it isn't written out, it's discarded. otherwise, the answer to your question is yes. where you do get "wrong" forward operator values is since the default is that the assimilation updates the NO_COPY_BACK values just like everything else, the prior forward operator values are fine (assuming no inflation or inflation of 1.0) but the posterior forward operator values (which are diagnostic only) are based on updated values and so aren't consistent with the original values.
if adaptive inflation is being used, then yes, extra work is done to compute values which could be skipped since they aren't written out and used in subsequent cycles.
this is the crux question. no, you're not missing anything. the way this was implemented was to create the smallest footprint in the code. nothing in the main dart code knows about non-state fields except the i/o code that doesn't write them back. the rest of the system is unaware of anything related to non-state values. as you point out, if you did this "right" there would be a lot of locations in the code that might have to know about non-state values. |
what is force_copy_bock for? Looks like it was intended to overwrite no_copy_back for particular ensemble copies. if ( do_prior_inflate ) & |
it think it was added when we made the fill_inflation_restart utility. we needed the i/o routines to write everything that was in the state regardless of what the copy back flag was set to for filter. |
I'm not super sure I buy that, but I don't think I understand the stages to write and copies yet.
filter setting by stage:
|
i wonder if some of the complexity is related to:
|
notes on copies and stages to write as I am reading the RTPS stuff for issue #353. The RTPS copy is outside the
There is a force_copy DART/assimilation_code/modules/assimilation/filter_mod.f90 Lines 2504 to 2506 in 8873fcd
what is this for if not forcing the write of a variable? The write_variables routine appears to be checking for this force:
related #106 🐳 |
same problem for creates files that do not have 'NO_COPY_BACK' variables |
related #560 |
@braczka this is the original discussion on 'NO_COPY_BACK' variables. |
Cool -- I forgot about this thread and will repose my question here. Let's take a soil moisture DA example from CLM. I have two variables in the DART state, the first variable H2OSOI_LIQ is the prognostic state variable which is UPDATED. The second variable is H2OSOI which is a diagnostic variable. We use this within the forward operator calculation -- in other words its the 'x' in the H(x). Because H2OSOI is diagnostic it is set to 'NO_COPY_BACK'. When diagnosing assimilation skill I often look at the time series of inflation values to determine if it is behaving properly. For this example I will look at the inflation time series for the prognostic H2OSOI_LIQ variable -- which is written to the clm_output_infprior_mean_d* file. My question is should I also be looking at the inflation values for the 'x' variable (H2OSOI) that goes into the calculation of the forward operator? Clearly this variable is also inflated and could influence the regression relationship between the expected and unobserved states of soil moisture. Because this 'x' variable is set to NO_COPY_BACK the inflation values stay at 1, but I could just as easily set to UPDATE, so the inflation values are written out. Scientifically -- its also not clear to me if the inflation for this 'x' variable should be preserved across assimilation cycles. Because H2OSOI is a function of H2OSOI_LIQ, the inflation applied to H2OSOI_LIQ also influences H2OSOI through the model integration. So maybe it doesn't matter and I shouldn't worry about it at all? Not sure. |
note from standup Nov 7th 2024: |
🐛 Your bug may already be reported!
Please search on the issue tracker before creating a new issue.
Describe the bug
This came up when reviewing the clm documentation. Here is the section of documentation:
Error Message
not sure, but I think incorrect sized inflation files
Which model(s) are you working with?
CLM, but this would affect any model that uses 'NO COPY BACK'
Version of DART
v9.11.8
Have you modified the DART code?
No
Build information
Independent of build
The text was updated successfully, but these errors were encountered: