Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantile methods #588

Merged
merged 272 commits into from
Jan 11, 2024
Merged

Quantile methods #588

merged 272 commits into from
Jan 11, 2024

Conversation

hkershaw-brown
Copy link
Member

@hkershaw-brown hkershaw-brown commented Dec 5, 2023

Description:

Adds a Quantile-Conserving Ensemble Filtering Framework to DART
Publications: QCEFF part1 QCEFF part 2.

The default QCEFF options are EAKF, normal distribution (no bounds).

This is a breaking change and will be v11 of DART
Namelist item removed from filter_nml

  • filter_kind - now a per qty option

Two new required namelists (added to input.nml files):

  • &probit_transform_nml
  • &algorithm_info_nml

assim_tools_mod

  • sort_obs_inc namelist option applied to ENKF only, so default is now .true.
  • spread_restoration is not supported in this version

algorithm_info_mod

  • runtime QCEFF options read from .csv or .txt file

New probability distribution modules:

  • beta_distribution_mod

  • bnrh_distribution_mod (bounded normal rank histogram)

  • gamma_distribution_mod

  • normal_distribution_mod

  • probit_transform_mod

  • distribution_params_mod

Update to lorenz_96_tracer_advection:

1d forward operator updates:

  • For non-integer powers, fix up values for negative bases

Docs:

  • index.rst: Nonlinear and Non-Gaussian Data Assimilation Capabilities in DART
  • QCEFF instructions: Quantile-Conserving Ensemble Filter Framework
  • Example to work through: QCEFF: Examples with the Lorenz 96 Tracer Model

Fixes issue

review notes #458
Note this pull request includes the fix for #416

Todo:

  • continuing existing experiments Feature Request: Using the QCF filter with existing assimilation experiments #477
  • Remove models/lorenz_96/work/README (readme file for papers)
  • remove qcf warning before merging into main: qceff_probit.rst, qceff-examples.rst, index.rst
  • change log add contributors (Chris Riedel)
  • After merge, create 'quantile_methods' tag (with a warning) so old readthedocs links point to docs. No, edit release notes not to point at quantile_methods.
  • change create_obs_sequence input file for lorenz_96 tracer?
  • developer_test from reading table algorithm_info_nml

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Documentation changes needed?

  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.

Tests

Please describe any tests you ran to verify your changes.
bitwise lorenz_96 with main for default qcf options
Jeff has run many algorithm tests.
Speed tests main vs. default qcf options Derecho

Timing:

Checklist for merging

  • Updated changelog entry
    ⚠️ ⚠️ ⚠️ ⚠️ 🧇
  • Documentation updated (remove WARNING)
    • qceff_probit.rst remove 'alpha release stage' ⚠️
    • qceff-examples.rst remove 'alpha release stage' ⚠️
    • index.rst remove WARNING ⚠️
  • Update conf.py

Checklist for release

  • Merge into main
  • Create release from the main branch with appropriate tag
  • Delete feature-branch

Testing Datasets

  • Dataset needed for testing available upon request
  • Dataset download instructions included
  • No dataset needed

jlaucar and others added 30 commits November 2, 2022 15:21
…ensemble members

are identical (sd is 0). The inflation, regression, and obs_increments are already
bullet-proofed for the 0 sd case. In the long-term, need to worry about what happens
if sd is very small as opposed to formally zero.
i8 to i4 conversion for gfortran in model_mod. See issue #352
for more details on various argument mismatches we will need to
fix for newer compilers
…form

of current observation. This code now reproduces across processor count
whereas it crashed with more than on process before.

Also changed the default parameters for the tracer flow part of the model
and the equation for finding the velocity.
… Fixed

bug in setting the filter type in algorithm_info.
…a numerical

error if the input state and the two smallest ensemble members were identical but
not all ensemble members were identical. Pretty rare situation in most applications.
Moved the normal distribution routines to their own module.
Newton search in gamma_distribution_mod to match the revised version
in the beta mod, and added a log_normal capability in quantile_distribution_mod
…d a beta_distribution

module that was developed by Chris Riedel. Added in testing code to the gamma, beta and
normal distributions modules and made the algorithms more efficient. Added a higher
accuracy inverse cdf for the normal module but it is switched off for now. It is much
more expensive in return for improved accuracy.
… by 0. This also addresses

the cases where ensemble members in the RH dupliate the bounds or each other in the interior.
This bitwise reproduces previous version for cases that do not have duplicates but has not been
tested for duplicates.
…model to include

a fixed sink so that the concentration goes to 0 remote from the source. This mimics the
kind of issues that are found with sea ice freezing entirely. Tests of this have
run successfully with a full suite of RH methods.
…oth upper and lower bounds

but were more serious for upper bound cases. This version works for long tests with a variety
of filter settings for both positive and negative tracers (bounded below and above) with
l96_tracer, but still needs much more comprehensive review and error checking.
a namelist parameter to turn the use of algorithm_info_mod on or off.
If use_algorithm_info_mod is false, default namelist parameters control
the filter as in previous versions.
…ormal in

quantile_distribution_mod to try to catch errors Molly is reporint with CICE.
Added use_algorithm_info namelist into perfect_model_obs and filter_mod to
control whether default namelist is used or algorithm_info_mod. Have now set
the default to .true. since this branch is supposed to be testing these
algorithm_info_mod capabilities.
…e that this

revealed the fact that gamma gamma products are not closed. If the sum of the
shape parameters of the prior and the likelihood is <= 1, the posterior is
not a gamma. An error message catches this, but it makes gammas a poor choice
for quantities that can get close to (or exactly to) the bounds.
…rameter

to flip the tracer from positive to negative to check assimilation code
for bounded above quantities.
with l96 tracer model. Removed sampling_error_correction from the input.nml
to avoid the need for copying over the sampling error table file to the
work directory. Sampling error correction is useful here but it's not
essential. Added the readme.rst even though it's not yet rst to enable
friendly testers to get started.
only typo changes, no other changes to text
Copy link
Collaborator

@johnsonbk johnsonbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the typos mentioned in the comments of this review in PR six twelve. Approving the six twelve merge will reconcile the typos and will add substantive edits to index.rst.

I added the substantive edits to index.rst because the quantile methods merge adds a lot of new information to the documentation that DART users may be unfamiliar with. It may be easier to read by following a standard practice in technical writing: using topic and stress positions in sentences. The topic position in a sentence contains information that a user already knows, or is presumed to know, while the stress position introduces new information.

After running through the four examples in guide/qceff-examples.rst and plotting their output with diagnostics/matlab/plot_ens_time_series.m, my conclusion is that

  1. Further exposition to the QCEFF examples should be added, or
  2. The examples should be modified to make the benefits of the new framework more evident


* NORMAL_DISTRIBUTION (default)
* BOUNDED_NORMAL_RH_DISTRIBUTION
* GAMMA_DISTRIBUTION
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still work when the sum of alpha and beta < 1?

guide/qceff_probit.rst Show resolved Hide resolved
guide/qceff-examples.rst Outdated Show resolved Hide resolved
guide/qceff-examples.rst Outdated Show resolved Hide resolved
guide/qceff_probit.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@mjs2369 mjs2369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The &probit_transform_nml and &algorithm_info_nml are missing from models/MITgcm_ocean/work/input.nml

guide/qceff_probit.rst Outdated Show resolved Hide resolved
assimilation_code/modules/assimilation/assim_tools_mod.rst Outdated Show resolved Hide resolved
.. code-block:: text

git checkout quantile_methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines above (7-15) shoud be removed or edited before release. The tools will no longer be in the alpha release stage and the user will not have to checkout the quantile_methods branch

The Quantile-Conserving Ensemble Filter Framework (QCEFF) tools are in the alpha release stage.
The DART development team (dart@ucar.edu) would be happy to hear about your experiences and is
anxious to build scientific collaborations using these new capabilities.


To get started, make sure that you are on the quantile_methods branch of DART: 


.. code-block:: text


   git checkout quantile_methods

There is another comment like this at the very beginning of guide/qceff_probit.rst

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this branch is the live documentation, I'll remove the warnings just before merging.
I plopped them in the checklist:

Screen Shot 2024-01-05 at 9 55 53 AM

Thanks for the reminder (and future reminders so I/you/we don't miss this)


!!!elseif(p%distribution_type == PARTICLE_FILTER_DISTRIBUTION) then
!!!call to_probit_particle(ens_size, state_ens, p, probit_ens, use_input_p, &
!!!bounded_below, bounded_above, lower_bound, upper_bound)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still planning on supporting distribution_type = PARTICLE_FILTER_DISTRIBUTION in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove PARTICLE_FILTER_DISTRIBUTION from list of available distributions

DART/guide/qceff_probit.rst

Lines 123 to 132 in d5c2164

Available distributions
------------------------
* NORMAL_DISTRIBUTION (default)
* BOUNDED_NORMAL_RH_DISTRIBUTION
* GAMMA_DISTRIBUTION
* BETA_DISTRIBUTION
* LOG_NORMAL_DISTRIBUTION
* UNIFORM_DISTRIBUTION
* PARTICLE_FILTER_DISTRIBUTION

since it is not available in this code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjs2369 I'm not sure when PARTICLE_FILTER_DISTRIBUTION will be supported,
I believe @jlaucar has experimented/tested with this.
For now I fixed the docs to match the code. Maybe PARTICLE_FILTER_DISTRIBUTION goes in v11.1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I agree that the PARTICLE_FILTER_DISTRIBUTION can be put into a later release if we want. Good catch with removing it from this release of the documentation.

Should we also remove it from this comment in algorithm_info_mod?

! UNIFORM_DISTRIBUTION, and PARTICLE_FILTER_DISTRIBUTION.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjs2369 it is not just the comment right? It is the code in algortihm_info_mod also:

case ('PARTICLE_FILTER_DISTRIBUTION')
qceff_table_data(row)%probit_state%dist_type = PARTICLE_FILTER_DISTRIBUTION

@jlaucar do you want particle_filter_distritbution in this release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should remove that as well. I feel like we the code would potentially take a while to be ready for this release and we should probably put it in with a later one

If we do as such, should we also remove the commented out code pertaining to it in probit_transform_mod? Lines 187-9 and the subroutine to_probit_particle starting on line 378

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjs2369 chatted with Jeff about this.
Conclusion: don't let people select PARTICLE_FILTER_DISTRIBUTION. Leave in the commented subroutine to_probit_particle (if we were being strict about code we would remove this, but this is science). I'll go ahead and remove case ('PARTICLE_FILTER_DISTRIBUTION') algorithm_info_mod.f90. Once again thanks for the detailed review on this.

Cheers,
Helen

@hkershaw-brown hkershaw-brown mentioned this pull request Jan 5, 2024
15 tasks
hkershaw-brown and others added 2 commits January 5, 2024 09:17
Co-authored-by: Marlena Smith <44214771+mjs2369@users.noreply.github.com>
@hkershaw-brown
Copy link
Member Author

I fixed the typos mentioned in the comments of this review in PR six twelve. Approving the six twelve merge will reconcile the typos and will add substantive edits to index.rst.

I added the substantive edits to index.rst because the quantile methods merge adds a lot of new information to the documentation that DART users may be unfamiliar with. It may be easier to read by following a standard practice in technical writing: using topic and stress positions in sentences. The topic position in a sentence contains information that a user already knows, or is presumed to know, while the stress position introduces new information.

doc changes merge in from #612 pull request.

@hkershaw-brown
Copy link
Member Author

The &probit_transform_nml and &algorithm_info_nml are missing from models/MITgcm_ocean/work/input.nml

Add these in 8ed6c2e

Add for any model/input.nml that was missing these.

#!/bin/bash
files=$(find . -name input.nml)

for f in $files
do
   if ! grep -q probit_transform_nml  $f; then
     echo probit not found $f
   fi

done

for f in $files
do
  if ! grep -q algorithm_info_nml  $f; then
     echo algorithm not found $f
   fi  
done

Copy link
Contributor

@braczka braczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This read and worked well overall -- I agreed with Ben's main comment that the documentation could benefit from more explicit discussion of the improvement that QCEFF can bring to bounded/non-linear quantities. I made a suggestion to add a figure to the introduction, and then maybe more discussion at the end that differentiates between the performance of example A and C vs. example B.

index.rst Outdated Show resolved Hide resolved
index.rst Outdated Show resolved Hide resolved
guide/qceff_probit.rst Outdated Show resolved Hide resolved
guide/qceff_probit.rst Outdated Show resolved Hide resolved
guide/qceff_probit.rst Outdated Show resolved Hide resolved
guide/qceff-examples.rst Show resolved Hide resolved
guide/qceff-examples.rst Outdated Show resolved Hide resolved
guide/qceff-examples.rst Outdated Show resolved Hide resolved
#. Examine the output with your favorite tools. Looking at the analysis ensemble
for the tracer_concentration variables with indices near the source (location 1)
and far downstream from the source (location 35) is interesting. Note that the
source estimation capabilities of the model and filters are not being tested here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am too much in the weeds here, but why was the source quantity not influenced by the assimilation, and the updates restricted to the state(wind) and tracer concentration only? It seems that the source is an unobserved variable -- but maybe its not part of the Lorenz model, but a boundary condition?

guide/qceff_probit.rst Show resolved Hide resolved
hkershaw-brown and others added 4 commits January 5, 2024 15:41
…by sqrt(2PI) in obs_increment_rank_histogram

removed 'under development' & revised 2008.

#588 (comment)
Co-authored-by: Brett Raczka <bmraczka@ucar.edu>
Co-authored-by: Brett Raczka <bmraczka@ucar.edu>
@hkershaw-brown hkershaw-brown merged commit 52e6e45 into main Jan 11, 2024
4 checks passed
@hkershaw-brown hkershaw-brown deleted the quantile_methods branch January 11, 2024 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants