Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update IASI YAML files for end-to-end testing #769

Merged

Conversation

emilyhcliu
Copy link
Collaborator

@emilyhcliu emilyhcliu commented Nov 27, 2023

This PR includes the following updates:

  1. update IASI YAML files for testing
  2. add IAS YAML files for config - data thinning and diagnostic flags included
  3. add IASI YAML for bufr to ioda processing
  4. add IASI into the data list for 3dvar

This PR has a paring UFO PR #3121 to fix the dimension for sensorCentralWavenumber
To test this PR, please check out the UFO branch: feature/satrad from JCSDA-internal. This branch consolidates all proposed code changes to UFO for gdas validation:
UFO PR #3122 --- add cloud seeding (for all-sky)
UFO PR #3121 --- generalize handling for sensor wavenumber
UFO PR #3094 --- fix ObsError bug and minor fix for Hydrometeor check
UFO PR #3122 --- fix ObsBiasCovariance::linear bug

The test results in global-workflow encountered Out-Of-Memory (OOM) issue.

Therefore, the test of the update was performed outside of the global-workflow using the fv3 nomodel executable. The data filtering results are comparable between GSI and JEDI (check validation results here)

@RussTreadon-NOAA
Copy link
Contributor

RussTreadon-NOAA commented Nov 27, 2023

Add 8 files changed in this PR to a working copy of gdas-validation. Configure gdas_prototype_3d.yaml to only process iasi_metop-a. Run prepatmiodaobs, atmanlinit, and atmanlrun. fv3jedi_var.x fails with

  0: QC iasi_metop-a brightnessTemperature_1574: 10120 passed out of 323980 observations.
177: Exception:         Reason: An exception occurred inside ioda while opening a variable.
177:    name:   MetaData/sensorCentralWavenumber
177:    source_column:  0
177:    source_filename:        /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ioda/src/engines/ioda/src/ioda/Has_Variables.cpp
177:    source_function:        ioda::Variable ioda::detail::Has_Variables_Base::open(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> &) const
177:    source_line:    108
177:
177: Exception: oops::Variational<FV3JEDI, UFO and IODA observations> terminating...
  0: QC iasi_metop-a brightnessTemperature_1579: 260 missing values.
  0: QC iasi_metop-a brightnessTemperature_1579: 43142 out of domain of use.

@RussTreadon-NOAA You acted fast!! I just added the information in this PR that it comes with a paring UFO PR. Please see the description part. You will need to check out a UFO branch: feature/satrad in gdas-validation.

@emilyhcliu
Copy link
Collaborator Author

emilyhcliu commented Nov 27, 2023

GDAS Validation Results (End-to-End) --- IASI Window Channel (921)
All Data (without Thinning) --- from BUFR to IODA
ufo_iasi_metop-b_omf_bc_channel_921

Comparison Thinned Data between JEDI and GSI
Thinned and Data from the Edge of the Scan Removed (JEDI)
ufo_iasi_metop-b_omf_qc_noscanedge_channel_921

Thinned and Data from the Edge of the Scan Removed (GSI)
gsi_iasi_metop-b_omf_gsi_bc_channel_921

Comparison Data passed QC between JEDI and GSI
Data Passed QC (JEDI)
ufo_iasi_metop-b_omf_qc_channel_921

Data passed QC (GSI)
gsi_iasi_metop-b_omf_gsi_qc_channel_921

Notes:

  1. Data thinning tested, and the result is comparable to GSI. The thinning criteria included are:

    • equal area thinning box
    • priority is given to data closer to the center of the box
    • priority is given to data with clearer pixels in the field of view. This is necessary. The data passed QC will be much reduced (~ 50%) without selecting more clear (more cloud-free) pixels.
  2. The data passed QC from GDASApp end-to-end is comparable with GSI

  3. The data thinning criteria should also include the surface type check (this is a work in progress).


ioda:
backend: netcdf
obsdataout: "{{ COM_OBS }}/{{ RUN }}.t{{ cyc }}z.mtiasi_$(splitvar).tm00.nc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why this needs to be mtiasi and not iasi? I ask because our convention has been lately $sensor_$platform so it would match the CRTM coefficients. But if there are different input streams, then it makes sense to distinguish.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cory For eacg IASI platform has three bufr sources:
gdas.t00z.mtiasi_metop-a.tm00.bufr_d --- normal feed
gdas.t00z.esiasi_metop-a.tm00.bufr_d --- EARS feed
gdas.t00z.iasidb_metop-a.tm00.bufr_d--- direct broadcast feed

The one I am working on in this PR is the normal feed. So, I use mtiasi in the filename to note that this file is converted from the normal feed.
In the future, I will need to merge these three files into one, which will be called iasi_metop-a for the combined IODA file.

@RussTreadon-NOAA
Copy link
Contributor

Install UFO branch feature/satrad in working copy of GDASApp feature/gdas-validation on Orion. Run 2021080100 prepatmiodaobs, atmanlinit, and atmanlrun. fv3jedi_var.x aborted with

  6: *** Error in `/work2/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x': malloc(): memory corruption (fast): 0x00000000173da7e9 ***
srun: error: Orion-20-12: task 187: Aborted
 25: Exception: UserError: Unable to find QC flags : brightnessTemperwture_1710@EffectiveQC0
srun: error: Orion-24-63: task 372: Aborted
 40: *** Error in `/work2/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x': corrupted size vs. prev_size: 0x0000000042065df0 ***
srun: error: Orion-24-64: task 375: Aborted
  8: Exception: UserError: Unable to find QC flags : brightnessTemperqture_1791@EffectiveQC0

Notice that the UserError message refers to brightnessTemperwture and brightnessTemperqture. Temperature is misspelled as Temperwture or Temperqture.

The log file with OOPS_DEBUG and OOPS_TRACE set to 1 is /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/logs/2021080100/gdasatmanlrun.log. A check of UserError finds various misspellings.

Orion-login-3:/work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/logs/2021080100$ grep -i unable gdasatmanlrun.log
 25: Exception: UserError: Unable to find QC flags : brightnessTemperwture_1710@EffectiveQC0
  8: Exception: UserError: Unable to find QC flags : brightnessTemperqture_1791@EffectiveQC0
 34: Exception: UserError: Unable to find QC flags : brightnessTempervture_2889@EffectiveQC0
 43: Exception: UserError: Unable to find QC flags : brightnessTemperzture_2921@EffectiveQC0
 11: Exception: UserError: Unable to find QC flags : brightnessTemper{ture_249@EffectiveQC0
 90: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_205@EffectiveQC0
 99: Exception: UserError: Unable to find QC flags : brightnessTemperxture_2119@EffectiveQC0
115: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1430@EffectiveQC0
 36: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_472@EffectiveQC0
 60: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_104@EffectiveQC0
 85: Exception: UserError: Unable to find QC flags : brightnessTempervture_259@EffectiveQC0
 93: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1947@EffectiveQC0
108: Exception: UserError: Unable to find QC flags : brightnessTemper~ture_2374@EffectiveQC0
 55: Exception: UserError: Unable to find QC flags : brightnessTemperyture_144@EffectiveQC0
103: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1568@EffectiveQC0
119: Exception: UserError: Unable to find QC flags : brightnessTemper{ture_1991@EffectiveQC0
111: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1659@EffectiveQC0
126: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2985@EffectiveQC0
187: Exception: UserError: Unable to find QC flags : brightnessTemperyture_1430@EffectiveQC0
174: Exception: UserError: Unable to find QC flags : brightnessTempersture_161@EffectiveQC0
225: Exception: UserError: Unable to find QC flags : brightnessTempertture_176@EffectiveQC0
290: Exception: UserError: Unable to find QC flags : brightnessTemperuture_1442@EffectiveQC0
307: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2910@EffectiveQC0
283: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_160@EffectiveQC0
205: Exception: UserError: Unable to find QC flags : brightnessTemperyture_1710@EffectiveQC0
221: Exception: UserError: Unable to find QC flags : brightnessTemperqture_161@EffectiveQC0
301: Exception: UserError: Unable to find QC flags : brightnessTemperzture_1946@EffectiveQC0
365: Exception: UserError: Unable to find QC flags : brightnessTemperwture_2333@EffectiveQC0
302: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2907@EffectiveQC0
318: Exception: UserError: Unable to find QC flags : brightnessTemperwture_212@EffectiveQC0
359: Exception: UserError: Unable to find QC flags : brightnessTempertture_3002@EffectiveQC0
360: Exception: UserError: Unable to find QC flags : brightnessTemperqture_2321@EffectiveQC0
362: Exception: UserError: Unable to find QC flags : brightnessTemperxture_1529@EffectiveQC0
370: Exception: UserError: Unable to find QC flags : brightnessTemperuture_2945@EffectiveQC0
379: Exception: UserError: Unable to find QC flags : brightnessTemper}ture_1913@EffectiveQC0
380: Exception: UserError: Unable to find QC flags : brightnessTemperqture_163@EffectiveQC0
  2: Exception: UserError: Unable to find QC flags : brightnessTemperuture_3518@EffectiveQC0
 32: Exception: UserError: Unable to find QC flags : brightnessTemperpture_3440@EffectiveQC0
  9: Exception: UserError: Unable to find QC flags : brightnessTemperuture_1946@EffectiveQC0
 17: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_3499@EffectiveQC0
 97: Exception: UserError: Unable to find QC flags : brightnessTempertture_3107@EffectiveQC0
105: Exception: UserError: Unable to find QC flags : brightnessTemperyture_3256@EffectiveQC0
 50: Exception: UserError: Unable to find QC flags : brightnessTempervture_2367@EffectiveQC0
 58: Exception: UserError: Unable to find QC flags : brightnessTempervture_3064@EffectiveQC0
 26: Exception: UserError: Unable to find QC flags : brightnessTempernture_3661@EffectiveQC0
 82: Exception: UserError: Unable to find QC flags : brightnessTemperuture_2948@EffectiveQC0
107: Exception: UserError: Unable to find QC flags : brightnessTempersture_3116@EffectiveQC0
 13: Exception: UserError: Unable to find QC flags : brightnessTempeture_3467@EffectiveQC0
 76: Exception: UserError: Unable to find QC flags : brightnessTemperuture_2760@EffectiveQC0
132: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_3055@EffectiveQC0
161: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2991@EffectiveQC0
185: Exception: UserError: Unable to find QC flags : brightnessTempeture_2990@EffectiveQC0
155: Exception: UserError: Unable to find QC flags : brightnessTemperjture_3518@EffectiveQC0
162: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1884@EffectiveQC0
195: Exception: UserError: Unable to find QC flags : brightnessTemperzture_2907@EffectiveQC0
203: Exception: UserError: Unable to find QC flags : brightnessTempertture_1532@EffectiveQC0
150: Exception: UserError: Unable to find QC flags : brightnessTemperrture_1697@EffectiveQC0
142: Exception: UserError: Unable to find QC flags : brightnessTempernture_2819@EffectiveQC0
167: Exception: UserError: Unable to find QC flags : brightnessTemperwture_3256@EffectiveQC0
209: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2562@EffectiveQC0
200: Exception: UserError: Unable to find QC flags : brightnessTempertture_4032@EffectiveQC0
280: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_3252@EffectiveQC0
188: Exception: UserError: Unable to find QC flags : brightnessTemperoture_3450@EffectiveQC0
213: Exception: UserError: Unable to find QC flags : brightnessTempeture_1947@EffectiveQC0
292: Exception: UserError: Unable to find QC flags : brightnessTemperyture_2977@EffectiveQC0
305: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2991@EffectiveQC0
333: Exception: UserError: Unable to find QC flags : brightnessTemperxture_332@EffectiveQC0
327: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2948@EffectiveQC0
376: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_3244@EffectiveQC0
353: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_3105@EffectiveQC0
368: Exception: UserError: Unable to find QC flags : brightnessTempervture_4068@EffectiveQC0
372: Exception: UserError: Unable to find QC flags : brightnessTempeture_3093@EffectiveQC0
 83: Exception: UserError: Unable to find QC flags : brightnessTemper{ture_2701@EffectiveQC0
 77: Exception: UserError: Unable to find QC flags : brightnessTemperuture_3036@EffectiveQC0
133: Exception: UserError: Unable to find QC flags : brightnessTemper|ture_2919@EffectiveQC0
159: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1643@EffectiveQC0
358: Exception: UserError: Unable to find QC flags : brightnessTemperwture_3087@EffectiveQC0
202: Exception: UserError: Unable to find QC flags : brightnessTemperpture_2374@EffectiveQC0

Is this simply a printout error or indicative of a bug in the code?

If we rerun without OOPS_DEBUG and OOPS_TRACE we actually get RMS and Jo values

  0: CostJo Observations: iasi_metop-a nobs= 198380866 Min=144.707, Max=339.184, RMS=249.434
  0:
  0:
  0: Jo Observations:
  0: iasi_metop-a nobs= 198380866 Min=144.707, Max=339.184, RMS=249.434
  0:
  0: End Jo Observations
  0: Jo Observations Equivalent:
  0: iasi_metop-a nobs= 199428152 Min=143.987, Max=374.41, RMS=254.334
  0:
  0: End Jo Observations Equivalent
  0: Jo Bias Corrected Departures:
  0: iasi_metop-a nobs= 198380866 Min=-66.0455, Max=132.478, RMS=11.192
  0:
  0: End Jo Bias Corrected Departures
  0: Jo Observations Errors:
  0: Diagonal observation error covariance
  0: iasi_metop-a nobs= 1003899 Min=0.500039, Max=9.11049, RMS=1.19772
  0:
  0: End Jo Observations Errors
  0: CostJo   : Nonlinear Jo(iasi_metop-a) = 148974, nobs = 1003899, Jo/n = 0.148395, err = 1.19772
  0: CostJo   : Nonlinear Jo = 148974

See log file /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/logs/2021080100/gdasatmanlrun.log.0. This run also aborts with the same reference to misspelled variants of brightnessTemperature.

@emilyhcliu
Copy link
Collaborator Author

Install UFO branch feature/satrad in working copy of GDASApp feature/gdas-validation on Orion. Run 2021080100 prepatmiodaobs, atmanlinit, and atmanlrun. fv3jedi_var.x aborted with

  6: *** Error in `/work2/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x': malloc(): memory corruption (fast): 0x00000000173da7e9 ***
srun: error: Orion-20-12: task 187: Aborted
 25: Exception: UserError: Unable to find QC flags : brightnessTemperwture_1710@EffectiveQC0
srun: error: Orion-24-63: task 372: Aborted
 40: *** Error in `/work2/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x': corrupted size vs. prev_size: 0x0000000042065df0 ***
srun: error: Orion-24-64: task 375: Aborted
  8: Exception: UserError: Unable to find QC flags : brightnessTemperqture_1791@EffectiveQC0

Notice that the UserError message refers to brightnessTemperwture and brightnessTemperqture. Temperature is misspelled as Temperwture or Temperqture.

The log file with OOPS_DEBUG and OOPS_TRACE set to 1 is /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/logs/2021080100/gdasatmanlrun.log. A check of UserError finds various misspellings.

Orion-login-3:/work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/logs/2021080100$ grep -i unable gdasatmanlrun.log
 25: Exception: UserError: Unable to find QC flags : brightnessTemperwture_1710@EffectiveQC0
  8: Exception: UserError: Unable to find QC flags : brightnessTemperqture_1791@EffectiveQC0
 34: Exception: UserError: Unable to find QC flags : brightnessTempervture_2889@EffectiveQC0
 43: Exception: UserError: Unable to find QC flags : brightnessTemperzture_2921@EffectiveQC0
 11: Exception: UserError: Unable to find QC flags : brightnessTemper{ture_249@EffectiveQC0
 90: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_205@EffectiveQC0
 99: Exception: UserError: Unable to find QC flags : brightnessTemperxture_2119@EffectiveQC0
115: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1430@EffectiveQC0
 36: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_472@EffectiveQC0
 60: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_104@EffectiveQC0
 85: Exception: UserError: Unable to find QC flags : brightnessTempervture_259@EffectiveQC0
 93: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1947@EffectiveQC0
108: Exception: UserError: Unable to find QC flags : brightnessTemper~ture_2374@EffectiveQC0
 55: Exception: UserError: Unable to find QC flags : brightnessTemperyture_144@EffectiveQC0
103: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1568@EffectiveQC0
119: Exception: UserError: Unable to find QC flags : brightnessTemper{ture_1991@EffectiveQC0
111: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1659@EffectiveQC0
126: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2985@EffectiveQC0
187: Exception: UserError: Unable to find QC flags : brightnessTemperyture_1430@EffectiveQC0
174: Exception: UserError: Unable to find QC flags : brightnessTempersture_161@EffectiveQC0
225: Exception: UserError: Unable to find QC flags : brightnessTempertture_176@EffectiveQC0
290: Exception: UserError: Unable to find QC flags : brightnessTemperuture_1442@EffectiveQC0
307: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2910@EffectiveQC0
283: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_160@EffectiveQC0
205: Exception: UserError: Unable to find QC flags : brightnessTemperyture_1710@EffectiveQC0
221: Exception: UserError: Unable to find QC flags : brightnessTemperqture_161@EffectiveQC0
301: Exception: UserError: Unable to find QC flags : brightnessTemperzture_1946@EffectiveQC0
365: Exception: UserError: Unable to find QC flags : brightnessTemperwture_2333@EffectiveQC0
302: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2907@EffectiveQC0
318: Exception: UserError: Unable to find QC flags : brightnessTemperwture_212@EffectiveQC0
359: Exception: UserError: Unable to find QC flags : brightnessTempertture_3002@EffectiveQC0
360: Exception: UserError: Unable to find QC flags : brightnessTemperqture_2321@EffectiveQC0
362: Exception: UserError: Unable to find QC flags : brightnessTemperxture_1529@EffectiveQC0
370: Exception: UserError: Unable to find QC flags : brightnessTemperuture_2945@EffectiveQC0
379: Exception: UserError: Unable to find QC flags : brightnessTemper}ture_1913@EffectiveQC0
380: Exception: UserError: Unable to find QC flags : brightnessTemperqture_163@EffectiveQC0
  2: Exception: UserError: Unable to find QC flags : brightnessTemperuture_3518@EffectiveQC0
 32: Exception: UserError: Unable to find QC flags : brightnessTemperpture_3440@EffectiveQC0
  9: Exception: UserError: Unable to find QC flags : brightnessTemperuture_1946@EffectiveQC0
 17: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_3499@EffectiveQC0
 97: Exception: UserError: Unable to find QC flags : brightnessTempertture_3107@EffectiveQC0
105: Exception: UserError: Unable to find QC flags : brightnessTemperyture_3256@EffectiveQC0
 50: Exception: UserError: Unable to find QC flags : brightnessTempervture_2367@EffectiveQC0
 58: Exception: UserError: Unable to find QC flags : brightnessTempervture_3064@EffectiveQC0
 26: Exception: UserError: Unable to find QC flags : brightnessTempernture_3661@EffectiveQC0
 82: Exception: UserError: Unable to find QC flags : brightnessTemperuture_2948@EffectiveQC0
107: Exception: UserError: Unable to find QC flags : brightnessTempersture_3116@EffectiveQC0
 13: Exception: UserError: Unable to find QC flags : brightnessTempeture_3467@EffectiveQC0
 76: Exception: UserError: Unable to find QC flags : brightnessTemperuture_2760@EffectiveQC0
132: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_3055@EffectiveQC0
161: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2991@EffectiveQC0
185: Exception: UserError: Unable to find QC flags : brightnessTempeture_2990@EffectiveQC0
155: Exception: UserError: Unable to find QC flags : brightnessTemperjture_3518@EffectiveQC0
162: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1884@EffectiveQC0
195: Exception: UserError: Unable to find QC flags : brightnessTemperzture_2907@EffectiveQC0
203: Exception: UserError: Unable to find QC flags : brightnessTempertture_1532@EffectiveQC0
150: Exception: UserError: Unable to find QC flags : brightnessTemperrture_1697@EffectiveQC0
142: Exception: UserError: Unable to find QC flags : brightnessTempernture_2819@EffectiveQC0
167: Exception: UserError: Unable to find QC flags : brightnessTemperwture_3256@EffectiveQC0
209: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2562@EffectiveQC0
200: Exception: UserError: Unable to find QC flags : brightnessTempertture_4032@EffectiveQC0
280: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_3252@EffectiveQC0
188: Exception: UserError: Unable to find QC flags : brightnessTemperoture_3450@EffectiveQC0
213: Exception: UserError: Unable to find QC flags : brightnessTempeture_1947@EffectiveQC0
292: Exception: UserError: Unable to find QC flags : brightnessTemperyture_2977@EffectiveQC0
305: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2991@EffectiveQC0
333: Exception: UserError: Unable to find QC flags : brightnessTemperxture_332@EffectiveQC0
327: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_2948@EffectiveQC0
376: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_3244@EffectiveQC0
353: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_3105@EffectiveQC0
368: Exception: UserError: Unable to find QC flags : brightnessTempervture_4068@EffectiveQC0
372: Exception: UserError: Unable to find QC flags : brightnessTempeture_3093@EffectiveQC0
 83: Exception: UserError: Unable to find QC flags : brightnessTemper{ture_2701@EffectiveQC0
 77: Exception: UserError: Unable to find QC flags : brightnessTemperuture_3036@EffectiveQC0
133: Exception: UserError: Unable to find QC flags : brightnessTemper|ture_2919@EffectiveQC0
159: Exception: UserError: Unable to find QC flags : brightnessTemper▒ture_1643@EffectiveQC0
358: Exception: UserError: Unable to find QC flags : brightnessTemperwture_3087@EffectiveQC0
202: Exception: UserError: Unable to find QC flags : brightnessTemperpture_2374@EffectiveQC0

Is this simply a printout error or indicative of a bug in the code?

If we rerun without OOPS_DEBUG and OOPS_TRACE we actually get RMS and Jo values

  0: CostJo Observations: iasi_metop-a nobs= 198380866 Min=144.707, Max=339.184, RMS=249.434
  0:
  0:
  0: Jo Observations:
  0: iasi_metop-a nobs= 198380866 Min=144.707, Max=339.184, RMS=249.434
  0:
  0: End Jo Observations
  0: Jo Observations Equivalent:
  0: iasi_metop-a nobs= 199428152 Min=143.987, Max=374.41, RMS=254.334
  0:
  0: End Jo Observations Equivalent
  0: Jo Bias Corrected Departures:
  0: iasi_metop-a nobs= 198380866 Min=-66.0455, Max=132.478, RMS=11.192
  0:
  0: End Jo Bias Corrected Departures
  0: Jo Observations Errors:
  0: Diagonal observation error covariance
  0: iasi_metop-a nobs= 1003899 Min=0.500039, Max=9.11049, RMS=1.19772
  0:
  0: End Jo Observations Errors
  0: CostJo   : Nonlinear Jo(iasi_metop-a) = 148974, nobs = 1003899, Jo/n = 0.148395, err = 1.19772
  0: CostJo   : Nonlinear Jo = 148974

See log file /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/logs/2021080100/gdasatmanlrun.log.0. This run also aborts with the same reference to misspelled variants of brightnessTemperature.

@RussTreadon-NOAA I am investigating........

@RussTreadon-NOAA
Copy link
Contributor

Does feature/satrad need to be built with specific snapshots of other JEDI repos?

@emilyhcliu
Copy link
Collaborator Author

Does feature/satrad need to be built with specific snapshots of other JEDI repos?

No, it does not.
I also keep the branch updated with the develop.

I re-ran iasi -metop-b case using fv3 nomodel executable (outside ot global-workflow) with srun -n 150. The run completed without failure and the results are good (comparable to GSI).

The misspelled brightnessTemper?ture marked as 'user error' may be cuased by OOM.
I tested iasi-metop-b in the global-workflow. I did not get the miss-spelled brightnessTemper?ture, but the run got terminated with the following message:

srun: error: slurm_send_recv_rc_msg_only_one to Orion-19-59:42424 : Zero Bytes were transmitted or received
308: fv3jedi_var.x: error: slurm_receive_msg: Socket timed out on send/recv operation

/work2/noaa/da/eliu/gdas-validation/comrot/gdas_eval_iasi_JEDI/logs/2021080100/ gdasatmanlrun.log

I re-ran ATMS case in the global-workflow which ran successfully before. The ATMS run got killed without an obvious reason. Here is the message in the log:

srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
  0: slurmstepd: error: *** STEP 15831942.0 ON Orion-23-61 CANCELLED AT 2023-11-28T12:31:50 ***
srun: error: Orion-23-61: tasks 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37: Killed
srun: Terminating StepId=15831942.0
srun: error: Orion-23-61: tasks 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38: Killed

/work2/noaa/da/eliu/gdas-validation/comrot/gdas_eval_atms_JEDI/logs/2021080100/ gdasatmanlrun.log

@RussTreadon-NOAA
Copy link
Contributor

I would expect fv3jedi_var.x to run to completion with ATMS. This is odd. Let me try a clean build of g-w with the ufo cloned by default with GDASApp.

@emilyhcliu
Copy link
Collaborator Author

I would expect fv3jedi_var.x to run to completion with ATMS. This is odd. Let me try a clean build of g-w with the ufo cloned by default with GDASApp.

After my issue with global-workflow setup step was resolved, I set up a new experiment for ATMS which should run to the end without a problem.

I still got workflow exception message (see below)
Command exited with status 137. This seems to be associated with memory use; occurs when a process is terminated because it's using too much memory

srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
  0: slurmstepd: error: *** STEP 15844894.0 ON Orion-20-33 CANCELLED AT 2023-11-29T12:04:42 ***
srun: error: Orion-20-33: tasks 0-38: Killed
srun: Terminating StepId=15844894.0
srun: error: Orion-20-38: tasks 194-231: Killed
srun: error: Orion-20-39: tasks 232-269: Killed
srun: error: Orion-20-42: tasks 346-383: Killed
srun: error: Orion-20-40: tasks 270-307: Killed
srun: error: Orion-20-37: tasks 156-193: Killed
srun: error: Orion-20-41: tasks 308-345: Killed
srun: error: Orion-20-35: tasks 78-116: Killed
srun: error: Orion-20-34: tasks 39-77: Killed
srun: error: Orion-20-36: tasks 117-155: Killed
^[[38;21m2023-11-29 12:04:44,545 - INFO     - root        : BEGIN: wxflow.exceptions.__init__^[[0m
^[[38;5;39m2023-11-29 12:04:44,545 - DEBUG    - root        : ( WorkflowException('An error occured during execution of srun -l --export=ALL -n 384 --cpus-per-task=1 /work/noaa/stmp/eliu/RUNDIRS/gdas_eval_atms_JEDI/gdasatmanl_00/fv3jedi_var.x /work/noaa/stmp/eliu/RUNDIRS/gdas_eval_atms_JEDI/gdasatmanl_00/gdas.t00z.atmvar.yaml'), 'An error occured during execution of srun -l --export=ALL -n 384 --cpus-per-task=1 /work/noaa/stmp/eliu/RUNDIRS/gdas_eval_atms_JEDI/gdasatmanl_00/fv3jedi_var.x /work/noaa/stmp/eliu/RUNDIRS/gdas_eval_atms_JEDI/gdasatmanl_00/gdas.t00z.atmvar.yaml' )^[[0m
^[[38;5;196m2023-11-29 12:04:44,545 - ERROR    - root        : An error occured during execution of srun -l --export=ALL -n 384 --cpus-per-task=1 /work/noaa/stmp/eliu/RUNDIRS/gdas_eval_atms_JEDI/gdasatmanl_00/fv3jedi_var.x /work/noaa/stmp/eliu/RUNDIRS/gdas_eval_atms_JEDI/gdasatmanl_00/gdas.t00z.atmvar.yaml^[[0m
^[[38;5;196m2023-11-29 12:04:44,545 - ERROR    - root        : An error occured during execution of srun -l --export=ALL -n 384 --cpus-per-task=1 /work/noaa/stmp/eliu/RUNDIRS/gdas_eval_atms_JEDI/gdasatmanl_00/fv3jedi_var.x /work/noaa/stmp/eliu/RUNDIRS/gdas_eval_atms_JEDI/gdasatmanl_00/gdas.t00z.atmvar.yaml^[[0m
^[[38;21m2023-11-29 12:04:44,545 - INFO     - root        :   END: wxflow.exceptions.__init__^[[0m
^[[38;5;39m2023-11-29 12:04:44,546 - DEBUG    - root        :  returning: None^[[0m
Traceback (most recent call last):
  File "/work2/noaa/da/eliu/gdas-validation/global-workflow/ush/python/pygfs/task/atm_analysis.py", line 130, in execute
    exec_cmd()
  File "/work2/noaa/da/eliu/gdas-validation/global-workflow/ush/python/wxflow/executable.py", line 230, in __call__
    raise ProcessError(f"Command exited with status {proc.returncode}:", long_msg)
wxflow.executable.ProcessError: Command exited with status 137:
'srun' '-l' '--export=ALL' '-n' '384' '--cpus-per-task=1' '/work/noaa/stmp/eliu/RUNDIRS/gdas_eval_atms_JEDI/gdasatmanl_00/fv3jedi_var.x' '/work/noaa/stmp/eliu/RUNDIRS/gdas_eval_atms_JEDI/gdasatmanl_00/gdas.t00z.atmvar.yaml'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/work2/noaa/da/eliu/gdas-validation/global-workflow/scripts/exglobal_atm_analysis_run.py", line 22, in <module>
    AtmAnl.execute()
  File "/work2/noaa/da/eliu/gdas-validation/global-workflow/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/work2/noaa/da/eliu/gdas-validation/global-workflow/ush/python/pygfs/task/atm_analysis.py", line 134, in execute
    raise WorkflowException(f"An error occured during execution of {exec_cmd}")
wxflow.exceptions.WorkflowException
+ JGLOBAL_ATM_ANALYSIS_RUN[1]: postamble JGLOBAL_ATM_ANALYSIS_RUN 1701280956 1
+ preamble.sh[68]: set +x
End JGLOBAL_ATM_ANALYSIS_RUN at 18:04:44 with error code 1 (time elapsed: 00:02:08)
+ atmanlrun.sh[1]: postamble atmanlrun.sh 1701280933 1
+ preamble.sh[68]: set +x
End atmanlrun.sh at 18:04:44 with error code 1 (time elapsed: 00:02:31)

@RussTreadon-NOAA
Copy link
Contributor

Orion tests

Successfully run fv3jedi_var.x to completion with iasi_metop-a using 384 tasks (8 x 8 layout), ppn=8, threads=1. OOP_STATS report {min,max} per task memory usage of 17.38 to 17.53 Gb. Orion compute nodes have 192 Gb. 17.5 * 8 is 140 Gb. This fits on Orion compute nodes.

Caveat: I did not use the original yaml created but atmanlinit. Two modifications have been made

  1. turn off bias correction (remove bias correction sections from input yaml). Did so given message /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/build/bin/../lib64/libufo.so(_ZN3ufo17ObsBiasCovariance9linearizeERKNS_7ObsBiasERKN5eckit13ConfigurationE+0x93b)[0x2af13162757b] in aborted fv3jedi_var.x run. Is there an error in the input yaml bias correction setup? Is there a bug in the variational bias correction code when processing so many channels?
  2. remove CO2 from CRTM absorbers list. Did so given message GeoVaLs field co2 has no known link to fields in model state found in gdasatmanlrun log from previous aborted run. Recall old fv3-jedi issue #449. Is this issue still an issue?

@RussTreadon-NOAA
Copy link
Contributor

IASI bias correction minimization error
Submit atmanlrun with CO2 removed from CRTM absorbers but retain bias correction in input yaml. fv3jedi_var.x aborts with the following traceback

  0: End Jo Bias Corrected Departures
  0: Jo Observations Errors:
  0: Diagonal observation error covariance
  0: iasi_metop-b nobs= 921793 Min=0.500039, Max=9.83365, RMS=1.19326
  0:
  0: End Jo Observations Errors
  0: CostJo   : Nonlinear Jo(iasi_metop-b) = 140886, nobs = 921793, Jo/n = 0.152839, err = 1.19326
  0: CostJo   : Nonlinear Jo = 140886
291: *** Error in `/work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x': corrupted size vs. prev_size: 0x000000004199d3a0 \
***
180: Exception: UserError: Unable to find QC flags : brightnessTemper\201ture_161@EffectiveQC0
291: ======= Backtrace: =========
291: /lib64/libc.so.6(+0x7f754)[0x2b1c19d62754]
291: /lib64/libc.so.6(+0x8184b)[0x2b1c19d6484b]
291: /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/build/bin/../lib64/libufo.so(_ZN4ioda9SelectionD1Ev+0x1c7)[0x2b1bfc376987]
291: /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/build/bin/../lib64/libioda.so(_ZNK4ioda8ObsSpace7loadVarIiEEvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_RKSt6vectorIiSaIiEERSA_IT_SaISF_EEb+0x3f4)[0x2b1bfd554174]
291: /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/build/bin/../lib64/libufo.so(_ZN3ufo17ObsBiasCovariance9linearizeERKNS_7ObsBiasERKN5eckit13ConfigurationE+0x346)[0x2b1bfc351f86]
291: /work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x[0x582586]
291: /work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x[0x58241b]
291: /work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x[0x58233a]
291: /work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x[0x581f3f]
291: /work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x[0x57fea9]
291: /work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x[0x54071b]
291: /work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x[0x53aa1b]
291: /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/build/bin/../lib64/liboops.so(_ZN4oops3Run7executeERKNS_11ApplicationERKN5eckit3mpi4CommE+0x105)[0x2b1c0061c8a5]
291: /work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x[0x4a3c01]
291: /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b1c19d05495]
291: /work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_JEDI/gdasatmanl_00/fv3jedi_var.x[0x4a3aa9]

First JEDI library reference is ufo. Is this a useful clue?

Have the JEDI core team or JEDI partners assimilated IASI with variational bias correction active?

@RussTreadon-NOAA
Copy link
Contributor

@CoryMartin-NOAA suggested retaining [H2O, O3, CO2] for the nonlinear CRTM iasi h(x) while using linear obs operator with only [H2O, O3] in the minimization. This change was tested and found to work ... only after commenting out the obs bias section. A fv3jedi_var.x run only processing iasi_metop-a ran to completion and generated the following initial & final Jo and final analysis increments

  0: CostJo   : Nonlinear Jo(iasi_metop-a) = 212658, nobs = 922852, Jo/n = 0.230436, err = 1.1962
  0: CostJo   : Nonlinear Jo = 212658
  0: CostJb: FG-BG
...
  0: End Jo Observations Errors
  0: CostJo   : Nonlinear Jo(iasi_metop-a) = 211130, nobs = 922852, Jo/n = 0.22878, err = 1.1962
  0: CostJo   : Nonlinear Jo = 211130
  0: CostJb: FG-BG
  0: ----------------------------------------------------------------------------------------------------
  0: Increment print | number of fields = 8 | cube sphere face size: C768
  0: eastward_wind                                | Min:+0.000000e+00 Max:+0.000000e+00 RMS:+0.000000e+00
  0: northward_wind                               | Min:+0.000000e+00 Max:+0.000000e+00 RMS:+0.000000e+00
  0: air_temperature                              | Min:-5.684342e-14 Max:+8.526513e-14 RMS:+3.547324e-16
  0: surface_pressure                             | Min:+0.000000e+00 Max:+0.000000e+00 RMS:+0.000000e+00
  0: specific_humidity                            | Min:-8.443693e-10 Max:+8.537149e-10 RMS:+2.331626e-12
  0: cloud_liquid_ice                             | Min:+0.000000e+00 Max:+1.618770e-20 RMS:+1.293217e-23
  0: cloud_liquid_water                           | Min:+0.000000e+00 Max:+1.474788e-19 RMS:+2.167418e-22
  0: ozone_mass_mixing_ratio                      | Min:-8.434685e-08 Max:+1.141620e-07 RMS:+3.431266e-10

The increments are small but the identity B is used for a single iteration. The strange order 1e-20 cloud ice and cloud water increments remain. We need to figure out where these increments come from. We should see 0.000000e+00 increments for these variables.

An attempt to run with both metop-a and metop-b iasi died with an OOM kill prior to entering the minimization. ppn=8 is too dense for two iasi instruments. I'll try ppn=6 or 4 later for both instruments. fv3jedi_var.x uses a lot of memory. What ppn will we need to use after we add CrIS?

fv3jedi_var.x runs to completion with VarBC active for atms. What's different when processing iasi besides there being many more channels? Does the fact that the channel numbers are not contiguous matter? Does it matter than we assimilate some iasi channels while monitor others? Can JEDI VarBC handle this? Other ideas or possibilities?

@emilyhcliu
Copy link
Collaborator Author

@CoryMartin-NOAA suggested retaining [H2O, O3, CO2] for the nonlinear CRTM iasi h(x) while using linear obs operator with only [H2O, O3] in the minimization. This change was tested and found to work ... only after commenting out the obs bias section. A fv3jedi_var.x run only processing iasi_metop-a ran to completion and generated the following initial & final Jo and final analysis increments

  0: CostJo   : Nonlinear Jo(iasi_metop-a) = 212658, nobs = 922852, Jo/n = 0.230436, err = 1.1962
  0: CostJo   : Nonlinear Jo = 212658
  0: CostJb: FG-BG
...
  0: End Jo Observations Errors
  0: CostJo   : Nonlinear Jo(iasi_metop-a) = 211130, nobs = 922852, Jo/n = 0.22878, err = 1.1962
  0: CostJo   : Nonlinear Jo = 211130
  0: CostJb: FG-BG
  0: ----------------------------------------------------------------------------------------------------
  0: Increment print | number of fields = 8 | cube sphere face size: C768
  0: eastward_wind                                | Min:+0.000000e+00 Max:+0.000000e+00 RMS:+0.000000e+00
  0: northward_wind                               | Min:+0.000000e+00 Max:+0.000000e+00 RMS:+0.000000e+00
  0: air_temperature                              | Min:-5.684342e-14 Max:+8.526513e-14 RMS:+3.547324e-16
  0: surface_pressure                             | Min:+0.000000e+00 Max:+0.000000e+00 RMS:+0.000000e+00
  0: specific_humidity                            | Min:-8.443693e-10 Max:+8.537149e-10 RMS:+2.331626e-12
  0: cloud_liquid_ice                             | Min:+0.000000e+00 Max:+1.618770e-20 RMS:+1.293217e-23
  0: cloud_liquid_water                           | Min:+0.000000e+00 Max:+1.474788e-19 RMS:+2.167418e-22
  0: ozone_mass_mixing_ratio                      | Min:-8.434685e-08 Max:+1.141620e-07 RMS:+3.431266e-10

The increments are small but the identity B is used for a single iteration. The strange order 1e-20 cloud ice and cloud water increments remain. We need to figure out where these increments come from. We should see 0.000000e+00 increments for these variables.

An attempt to run with both metop-a and metop-b iasi died with an OOM kill prior to entering the minimization. ppn=8 is too dense for two iasi instruments. I'll try ppn=6 or 4 later for both instruments. fv3jedi_var.x uses a lot of memory. What ppn will we need to use after we add CrIS?

fv3jedi_var.x runs to completion with VarBC active for atms. What's different when processing iasi besides there being many more channels? Does the fact that the channel numbers are not contiguous matter? Does it matter than we assimilate some iasi channels while monitor others? Can JEDI VarBC handle this? Other ideas or possibilities?

@RussTreadon-NOAA
I think you caught a bug in ObsBiasCovariance part.

228: Exception: UserError: Unable to find QC flags

This error message comes from the UFO ObsBiasCovariance::linearized code.
The IASI data is special. The generic IASI variable from BUFR is radiance, and then it is transformed into brightnessTemperature in the UFO. So, IASI has two observation-related groups: DerivedObsValueandObsValuegroup. TheObsValuegroup stores radiance, and theDerivedObsValue` group stores brightnessTemperature.

It seems to me that the ObsBiasCovariance code does not handle the dimension correctly when both derivedObsValue and ObsValue exist. Let me investigate more and make sure my understanding is correct. I will see if I can fix it. If not, I will ask for help from JCSDA.

Thanks for your help in fixing the memory issue in the global-workflow and pointing out that the problem likely comes from the ObsBias part. We won't see the problem unless the memory issue gets fixed first.

@RussTreadon-NOAA
Copy link
Contributor

Thank you, @emilyhcliu , for agreeing that we are likely dealing with a code bug and identifying the possible cause. Hopefully the fix is not too difficult.

fv3jedi_var.x ran to completion processing iasi metop-a and metop-b on 96 nodes with ppn=4, 1 thread. Per task memory usage ranged from 26.57 to 27.81 Gb with a `fv3jedi_var.x run time of 1539.33 seconds.

My runs to date use an {8,8} layout. The two iasi run was repeated with a {10,10} layout. This required 150 nodes with ppn=4, 1 thread. Per task memory usage ranged from 18.26 to 18.39 Gb with fv3jedi_var.x run time of 1079.30 sec.

The initial (pre-minimization) observation statistics differ between the {8,8} and {10,10} runs.

{10,10} layout

  0: CostJo   : Nonlinear Jo(iasi_metop-a) = 212245, nobs = 920743, Jo/n = 0.230515, err = 1.19701
  0: CostJo   : Nonlinear Jo(iasi_metop-b) = 208661, nobs = 942762, Jo/n = 0.221329, err = 1.20146
  0: CostJo   : Nonlinear Jo = 420906

{8,8} layout

  0: CostJo   : Nonlinear Jo(iasi_metop-a) = 212658, nobs = 922852, Jo/n = 0.230436, err = 1.1962
  0: CostJo   : Nonlinear Jo(iasi_metop-b) = 208662, nobs = 942763, Jo/n = 0.22133, err = 1.20146
  0: CostJo   : Nonlinear Jo = 421320

This seems wrong. Changing the layout shouldn't alter QC (and analysis) results. The GSI generates the same analysis independent of task and thread count. Is this above fv3jedi_var.x behavior simply a printout issue or are the assimilated data counts truly different?

Tagging @DavidNew-NOAA since you are exploring memory, wall time, and reproducibility issues in JEDI applications. This is an important study as we incrementally move towards operational implementation of JEDI apps.

@emilyhcliu
Copy link
Collaborator Author

emilyhcliu commented Dec 1, 2023

@RussTreadon-NOAA
I modified the UFO code related to ObsBiasCovariance::linear. It is a one-line fix.
The change is in branch: feature/obsbiascov_fix at JCSDA-internal/ufo.

The IASI (with ObsBias included) test can run to completion with this fix.

  name: CRTM
  Absorbers: [H2O,O3,CO2]
  obs options:
    Sensor_ID: iasi_metop-b
    EndianType: little_endian
    CoefficientPath: $(DATA)/crtm/
  linear obs operator:
    Absorbers: [H2O,O3]

obs bias:
  input file: $(DATA)/obs/$(GPREFIX)iasi_metop-b.satbias.nc4
  output file: $(DATA)/bc/$(APREFIX)iasi_metop-b.satbias.nc4
  variational bc:
    predictors:
    - name: constant
    - name: lapse_rate
      order: 2
      tlapse: &iasi_metop-b_tlapse $(DATA)/obs/$(GPREFIX)iasi_metop-b.tlapse.txt
    - name: lapse_rate
      tlapse: *iasi_metop-b_tlapse
    - name: emissivity
    - name: scan_angle
      order: 4
    - name: scan_angle
      order: 3
    - name: scan_angle
      order: 2
    - name: scan_angle
  covariance:
    minimal required obs number: 20
    variance range: [1.0e-6, 10.0]
    step size: 1.0e-4
    largest analysis variance: 10000.0
    prior:
      input file: $(DATA)/obs/$(GPREFIX)iasi_metop-b.satbias_cov.nc4
      inflation:
        ratio: 1.1
        ratio for small dataset: 2.0
    output file: $(DATA)/bc/$(APREFIX)iasi_metop-b.satbias_cov.nc4

I also commented out the DiagnosticFlags in the YAML to save memory use.

The HofX and data filtering results are comparable with GSI.
Here is the log: /work2/noaa/da/eliu/gdas-validation/comrot/gdas_eval_iasi_JEDI/logs/2021080100/gdasatmanlrun.log

There are non-zero increments for clouds (they should be zeros).
The ObsBias increments are ~zeros.

0: ----------------------------------------------------------------------------------------------------
  0: Increment print | number of fields = 8 | cube sphere face size: C768
  0: eastward_wind                                | Min:+0.000000e+00 Max:+0.000000e+00 RMS:+0.000000e+00
  0: northward_wind                               | Min:+0.000000e+00 Max:+0.000000e+00 RMS:+0.000000e+00
  0: air_temperature                              | Min:-8.526513e-14 Max:+5.684342e-14 RMS:+2.184088e-16
  0: surface_pressure                             | Min:+0.000000e+00 Max:+0.000000e+00 RMS:+0.000000e+00
  0: specific_humidity                            | Min:-8.818142e-10 Max:+8.551429e-10 RMS:+2.451350e-12
  0: cloud_liquid_ice                             | Min:+0.000000e+00 Max:+1.618770e-20 RMS:+1.293217e-23
  0: cloud_liquid_water                           | Min:+0.000000e+00 Max:+1.474788e-19 RMS:+2.167418e-22
  0: ozone_mass_mixing_ratio                      | Min:-9.332354e-08 Max:+1.071624e-07 RMS:+2.578888e-10
  0: ----------------------------------------------------------------------------------------------------OOPS_TRACE[0] Increment<MODEL>::print done
  0: OOPS_TRACE[0] ModelAuxIncrement<MODEL>::print starting
  0: OOPS_TRACE[0] ModelAuxIncrement<MODEL>::print done
  0: OOPS_TRACE[0] ObsAuxIncrement<OBS>::print starting
  0: ufo::ObsBiasIncrement::print
  0: ---------------------------------------------------------------
  0:             constant:  Min=       -0.000000,  Max=        0.000000,  Norm=        0.000000
  0:   lapse_rate_order_2:  Min=       -0.000000,  Max=        0.000000,  Norm=        0.000000
  0:           lapse_rate:  Min=       -0.000000,  Max=        0.000000,  Norm=        0.000000
  0:           emissivity:  Min=       -0.000000,  Max=        0.000000,  Norm=        0.000000
  0:   scan_angle_order_4:  Min=       -0.000000,  Max=        0.000000,  Norm=        0.000000
  0:   scan_angle_order_3:  Min=       -0.000000,  Max=        0.000000,  Norm=        0.000000
  0:   scan_angle_order_2:  Min=       -0.000000,  Max=        0.000000,  Norm=        0.000000
  0:           scan_angle:  Min=       -0.000000,  Max=        0.000000,  Norm=        0.000000
  0: ---------------------------------------------------------------

@CoryMartin-NOAA
Copy link
Contributor

I think 1e-20 for a max value is close enough to zero for the clouds.

@CoryMartin-NOAA
Copy link
Contributor

Wait, why are all the increments so small. Temperature increments on the order of 1e-14???

@emilyhcliu
Copy link
Collaborator Author

emilyhcliu commented Dec 1, 2023

Here is the result from @RussTreadon-NOAA without ObsBias included.

The increments are small as well

@CoryMartin-NOAA
Copy link
Contributor

Thanks @emilyhcliu oh it's the identity B, ok, then it's probably fine. Thanks for the fix!

Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added change from UFO PR #3131 to working copy of gdas-validation ufo. Recompile and rerun 2021080100. atmanlrun completed without error with VarBC active for iasi metop-a and metop-b.

Approve pending merger of required UFO PRs into authoritative develop.

@RussTreadon-NOAA
Copy link
Contributor

RussTreadon-NOAA commented Dec 4, 2023

@emilyhcliu , the following UFO PRs have been merged into the authoritative UFO develop:

UFO PR #3122 is still a draft.

May we merge GDASApp PR #769, this PR, into GDASApp develop before UFO PR #3122 is reviewed, approved, and merge into the authoritative UFO develop?

@emilyhcliu
Copy link
Collaborator Author

emilyhcliu commented Dec 4, 2023

@RussTreadon-NOAA
Here is the list of UFO PRs related to GDASApp code sprint:
UFO PR #https://github.com/JCSDA-internal/ufo/pull/3122 --- add cloud seeding (for all-sky)
UFO PR #https://github.com/JCSDA-internal/ufo/pull/3121 --- generalize handling for sensor wavenumber
UFO PR #https://github.com/JCSDA-internal/ufo/pull/3094 --- fix ObsError bug and minor fix for Hydrometeor check
UFO PR #https://github.com/JCSDA-internal/ufo/pull/3131 --- fix ObsBiasCovariance::linear bug

The PRs related to IASI are:
PR # 3094 --- merged.
PR # 3131 --- approved; ready for merge

The other PRs are for MW:
PR #3121 --- merged
PR #3122 --- under review
I think we can merge this PR if there are no objections.
I will have a meeting with Ben Ruston from JCSDA this afternoon. I will remind him that PR #3131 is ready for merge.

@RussTreadon-NOAA
Copy link
Contributor

RussTreadon-NOAA commented Dec 4, 2023

Thank you, @emilyhcliu , for the list of UFO PRs related to the GDAS-validation sprint. My question, however, is different.

You mention four UFO PRs in the description of this PR. Currently three of the four have been merged into jcsda-internal UFO develop. Must all four of these UFO PRs be merged into the jcsda-internal UFO develop before we (@CoryMartin-NOAA or I) merge this PR into GDASApp develop?

  • If no, I recommend merging this PR into GDASApp develop as soon as possible.
  • If yes, we need to keep this PR open until PR #3122 is merged into jcsda-internal UFO develop.

The UFO PR #3122 will only affect all-sky MW in the end-to-end test.
One question: do we want to merge this PR into develop instead of gdas-validation branch?

@CoryMartin-NOAA
Copy link
Contributor

@RussTreadon-NOAA I think the validation sprint branch.

But that begs the larger question, when should the sprint branch be merged into develop? I'm not sure it's required (in my mind) that 100% reproducibility exists to go into develop but the code should not fail. Thus, I think that open draft UFO PR won't hurt to stay open as this gets merged into develop.

@RussTreadon-NOAA
Copy link
Contributor

Agreed. I rebuilt gdas-validation on Orion. My snapshot of UFO is the authoritative develop at 3eda06c5. prepatmiodaobs, atmanlinit, and atmanlrun ran to completion.

The fact that UFO PR #3122 remains a draft does not produce a failure when the changes in GDASApp PR #769. are exercised in gdas-validation. Given this, I think this PR can be merged into feature/gdas-validation. That said, I'm not on the developer side of the gdas-validation PRs. I defer to the developers as to when it's the appropriate time to merge gdas-validation sprint PRs into develop.

Hence this conversation with @emilyhcliu to see if it's OK to go ahead and merge this PR into gdas-validation.

@RussTreadon-NOAA
Copy link
Contributor

Absent objections for anyone I will merge this PR into feature/gdas-validation no later than 12:00 pm EST today (11/5/2023).

@RussTreadon-NOAA RussTreadon-NOAA merged commit 30ab4f9 into feature/gdas-validation Dec 5, 2023
@CoryMartin-NOAA CoryMartin-NOAA deleted the feature/gdas-validation-iasi_update branch March 28, 2024 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants