Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[production/AQM.v7] Bugfix/bias correction for Production/AQM.v7 #841

Conversation

JianpingHuang-NOAA
Copy link

  • Update Production/AQM.v7 to head at ufs-community

DESCRIPTION OF CHANGES:

This PR is used to address the recent failures of bias-correction for O3 and PM2.5 on the first day of each month as described by srweather-app issue#838. In addition, several other changes are included with this PR in order to run the package successfully.

  1. The updates of two ex-scripts of bias-correction for O3 and PM2.5
  2. The Path of bias-correction training data files
  3. The updates of ex-scripts for NEXUS which is used to address biogenic emissions

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

DOCUMENTATION:

ISSUE:

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

Kai Wang from EMC for bias-correction and Barry Baker from ARL for the biogenic emission related to NEXUS work.

@chan-hoo chan-hoo changed the title Bugfix/bias correction for Production/AQM.v7 [production/AQM.v7] Bugfix/bias correction for Production/AQM.v7 Jun 22, 2023
@JianpingHuang-NOAA
Copy link
Author

@chan-hoo Corrections have been made as suggested.

@chan-hoo
Copy link
Collaborator

@JianpingHuang-NOAA, thanks for the change. I'm testing your PR now. Once it is completed successfully, I'll approve this PR.

@chan-hoo
Copy link
Collaborator

chan-hoo commented Jun 23, 2023

@JianpingHuang-NOAA, does this PR resolve the issues #791 and #749 as well?

@JianpingHuang-NOAA
Copy link
Author

JianpingHuang-NOAA commented Jun 23, 2023 via email

@chan-hoo
Copy link
Collaborator

@JianpingHuang-NOAA, if this PR has a dependency (a PR in NEXUS), I'll test this PR later once it is merged. Please let me know once you submit a PR to NEXUS.

@JianpingHuang-NOAA
Copy link
Author

@chan-hoo Is there any issue for this PR? Can we get this PR merge done as soon as possible? We need to open another three PRs to address soil NOx emissions, updated of RAVE data, and EE2 compliance. Thanks

@JianpingHuang-NOAA
Copy link
Author

@RatkoVasic-NOAA @BenjaminBlake-NOAA @chan-hoo Can you help us to get this PR review done as soon as possible? We still have another several PRs to be merged. We need to make sure the workflow be ready for us to complete retrospective runs by next week. Thanks !

@chan-hoo
Copy link
Collaborator

chan-hoo commented Jun 27, 2023

@JianpingHuang-NOAA, you said the python file (concatenate_nexus_post_split.py) should be hard-copied to run this PR. It is not allowed. To merge this PR, the dependency must be merged first. Please merge the PR in NEXUS and update the hash of NEXUS.

@bbakernoaa
Copy link
Contributor

I merged this a little bit ago. We should be able to update the nexus has as 6a7a994

@chan-hoo @JianpingHuang-NOAA

@chan-hoo
Copy link
Collaborator

@JianpingHuang-NOAA, please update the hash of nexus. Then I'll test this PR.

@JianpingHuang-NOAA
Copy link
Author

@chan-hoo Done.

@chan-hoo
Copy link
Collaborator

Testing this PR now.

@chan-hoo
Copy link
Collaborator

@JianpingHuang-NOAA, the bias-correction tasks failed with the following error:

cp: cannot create regular file '/lfs/h2/emc/physics/noscrub/jianping.huang/Bias_correction/aqmv7.0.81/bcdata.202306/interpolated/ozone/2023/forecasts.interp.20230626.00z.nc': Permission denied

This is because my run tried to write the bias-correction files to your directory. Please resolve this issue.

You can find the log file here:

/lfs/h2/emc/lam/noscrub/chan-hoo.jeon/prod_pr_test/expt_dirs/aqm_nco_aqmna13km/log

@chan-hoo
Copy link
Collaborator

@JianpingHuang-NOAA, one solution is not to remove the flag if [ "${DO_AQM_SAVE_AIRNOW_HIST}" = "TRUE" ]; then from the script (Ln 217 in the current bias-correction ex-scripts).

@JianpingHuang-NOAA
Copy link
Author

@chan-hoo This "if" has caused some trouble when I tested. In fact, this "if" is not necessary. If anyone from the community wants to run the bias correction, they have to copy all the training data for the individual runs. If people does not want to run the bias correction, it is not necessary to set it as "F". So in any case, this "if " option is not necessary. The parameter of "DO_AQM_SAVE_AIRNOW_HIST" is not needed in the config.yaml or machine.yaml files.

@chan-hoo
Copy link
Collaborator

@JianpingHuang-NOAA, the problem is that the sample case fails in my run because the bias-correction files are set to be written in your directory. I don't understand what problems this flag causes. This is just a flag. You can turn it on/off in config.yaml. Anyhow, if you don't want to use this flag, please provide another solution. I can approve this PR only when it completes without any failures.

@JianpingHuang-NOAA
Copy link
Author

@chan-hoo Since the flag, DO_AQM_SAVE_AIRNOW_HIST, has been deleted from the machine and config yaml files, we need to change everything back which we do not prefer. In order to support your test, can you change the following line by pointing to your personal location in wcoss2.yaml at ~/ush/machine

COMOUTbicor: /lfs/h2/emc/physics/noscrub/jianping.huang/Bias_correction/aqmv7.0.81
=->
COMOUTbicor: /lfs/h2/emc/physics/noscrub/chan-hoo.jeon/Bias_correction/aqmv7.0.81

Or is there any way of using ${USER} rather than a specific user name in above line?

@chan-hoo
Copy link
Collaborator

@JianpingHuang-NOAA, I think you are confused between AQM_AIRNOW_HIST_DIR and DO_AQM_SAVE_AIRNOW_HIST. They are not the same. AQM_AIRNOW_HIST_DIR is the path to airnow data and you changed this to COMINbicor/COMOUTbicor. I agree with this change. DO_AQM_SAVE_AIRNOW_HIST is the flag to determine where or not we save the airnow output data to COMOUTbicor. You didn't remove DO_AQM_SAVE_AIRNOW_HIST from the script but remove its description. If so, do you agree to keep this flag (not AQM_AIRNOW_HIST_DIR but DO_AQM_SAVE_AIRNOW_HIST)?

@chan-hoo
Copy link
Collaborator

chan-hoo commented Jun 28, 2023

@JianpingHuang-NOAA, this flag DO_AQM_SAVE_AIRNOW_HIST (!!! NOT AQM_AIRNOW_HIST_DIR !!!) gives us an option not to save the training data. We don't have to save the training data for testing. If you don't like its name, what about changing its name to DO_AQM_SAVE_BICOR_TRAINING_DATA or similar?

@JianpingHuang-NOAA
Copy link
Author

@chan-hoo
(1) I totally understood the difference between "DO_AQM_SAVE_AIRNOW_HIST" and "AQM_AIRNOW_HIST_DIR". The former is a flag or option of saving AirNow observational data which I proposed to delete it. The latter is the one for saving AirNow observational and model output grids data files (i.e., met_sfc and chem_sfc files), which I propose to change with "COMINbicor" and "COMOUTbicor", depending on whether we read in training data for bias-correction use or writing the training data for future bias-correction use. This is consistent with the current operational model.
(2) If you prefer to keep a flag "DO_AQM_SAVE_AIRNOW_HIST", that is fine but it is really not necessary. I can add that back.
(3) There are quite a lot EE2 compliance issues such as DCOMINbio_dfv, DCOMINchem_lbcs_dfv, DCOMINgefs_dfv, DCOMINpt_src_dfv, etc. None of them are belong to DCOMIN. I plan to open an issue and followed by a PR to address all the EE2 issues after ARL opens a PR on soil NOx emissions.

We received part of reprocessed RAVE data yesterday, we need to set up a retro rerun with the updated workflow as soon as possible.

Hope this is clear. This time let us focuses the BC issue. If you prefer, I will change (2) back. If you agree, I can set up a Google Meeting for further explanation.

Thanks.

@chan-hoo
Copy link
Collaborator

@JianpingHuang-NOAA, glad to hear that. Yes, please revive the flag. Regarding the EE2 issues, please open PRs to fix them. I don't think we'll need a google meeting.

@JianpingHuang-NOAA
Copy link
Author

@chan-hoo I just had a quick meeting with Kai on this issue. Even though you set "DO_AQM_SAVE_AIRNOW_HIST=False", you can ignore the error message of saving training data to my location, but your bias correction job will fail again because the bias-correction job can not find the model grid data files (met_sfc and chem_sfc). This means you have to create your personal bias correction training data sets for different cases through "COMINbicor" and "COMOUTbicor" which require to be pointed to your path rather than my path. For my case, I have to do the same way for each experiment or test. This is because the BC results are very sensitive to the training data.

Here are the steps for you to test the bias-correction jobs.
(1) mkdir -p /lfs/h2/emc/physics/noscrub/chan-hoo.jeon/Bias_correction/aqmv7.0

(2) link all the previous months training data files from my path to yours except for the current month which requires a "cp' rather than "link"

ln -s /lfs/h2/emc/physics/noscrub/jianping.jhuang/Bias_correction/aqmv7.0/bcdata.2022* /lfs/h2/emc/physics/noscrub/chan-hoo.jeon/Bias_correction/aqmv7.0

ln -s /lfs/h2/emc/physics/noscrub/jianping.jhuang/Bias_correction/aqmv7.0/bcdata.202301 /lfs/h2/emc/physics/noscrub/chan-hoo.jeon/Bias_correction/aqmv7.0
ln -s /lfs/h2/emc/physics/noscrub/jianping.jhuang/Bias_correction/aqmv7.0/bcdata.202302 /lfs/h2/emc/physics/noscrub/chan-hoo.jeon/Bias_correction/aqmv7.0
ln -s /lfs/h2/emc/physics/noscrub/jianping.jhuang/Bias_correction/aqmv7.0/bcdata.202303 /lfs/h2/emc/physics/noscrub/chan-hoo.jeon/Bias_correction/aqmv7.0
ln -s /lfs/h2/emc/physics/noscrub/jianping.jhuang/Bias_correction/aqmv7.0/bcdata.202304 /lfs/h2/emc/physics/noscrub/chan-hoo.jeon/Bias_correction/aqmv7.0
ln -s /lfs/h2/emc/physics/noscrub/jianping.jhuang/Bias_correction/aqmv7.0/bcdata.202305 /lfs/h2/emc/physics/noscrub/chan-hoo.jeon/Bias_correction/aqmv7.0

cp -rp /lfs/h2/emc/physics/noscrub/jianping.jhuang/Bias_correction/aqmv7.0/bcdata.202306 /lfs/h2/emc/physics/noscrub/chan-hoo.jeon/Bias_correction/aqmv7.0

(3) modify COMINbicor and COMOUTbicor in the wcoss2.yaml under ~/ush/wcoss2

COMINbicor: /lfs/h2/emc/physics/noscrub/chan-hoo.jeon/Bias_correction/aqmv7.0
COMOUTbicor: /lfs/h2/emc/physics/noscrub/chan-hoo.jeon/Bias_correction/aqmv7.0

Thanks.

@chan-hoo
Copy link
Collaborator

@JianpingHuang-NOAA, @KaiWang-NOAA, the workflow doesn't have the regression tests of the ufs-weather-model. Instead, it has WE2E tests to verify a new change. Simply, this means to run the sample configuration such as config.aqm.nco.realtime.yaml with a minor change of ACCOUNT and MACHINE. This is because the workflow is not just for an individual and anyone who wants to make a change to the workflow can test the change easily. As you described above, without this flag, very complicated steps are required to run the sample case.

@chan-hoo chan-hoo merged commit 38b8646 into ufs-community:production/AQM.v7 Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants