Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TC Analysis template #632

Merged
merged 4 commits into from
Dec 10, 2024
Merged

Update TC Analysis template #632

merged 4 commits into from
Dec 10, 2024

Conversation

forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Oct 16, 2024

Issue resolution

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

1. Does this do what we want it to do?

Objectives:

  • Make tc_analysis.bash more closely match the file in E3SM Diags.

Required:

  • Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
  • Testing: I have added or modified at least one "min-case" configuration file to test this change. Every objective above is represented in at least one cfg.
  • Testing: I have considered likely and/or severe edge cases and have included them in testing.

If applicable:

  • Testing: this pull request introduces an important feature or bug fix that we must test often. I have updated the weekly-test configuration files, not just a "min-case" one.
  • Testing: this pull request adds at least one new possible parameter to the cfg. I have tested using this parameter with and without any other parameter that may interact with it.

2. Are the implementation details accurate & efficient?

Required:

  • Logic: I have visually inspected the entire pull request myself.
  • Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

If applicable:

  • Dependencies: This pull request introduces a new dependency. I have discussed this requirement with at least one other team member. The dependency is noted in zppy/conda, not just an import statement.

3. Is this well documented?

Required:

  • Documentation: by looking at the docs, a new user could easily understand the functionality introduced by this pull request.

4. Is this code clean?

Required:

  • Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
  • Pre-commit checks: All the pre-commits checks have passed.

If applicable:

  • Software architecture: I have discussed relevant trade-offs in design decisions with at least one other team member. It is unlikely that this pull request will increase tech debt.

@forsyth2 forsyth2 self-assigned this Oct 16, 2024
@forsyth2 forsyth2 force-pushed the tc-analysis-updates branch from d23fe05 to ea61bc6 Compare October 16, 2024 01:44
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 16, 2024

TODO:

  • Remove scratch parameter elsewhere it appears. Will need to stay in default.ini for backwards compatibility.
  • Test the tc_analysis min-case cfgs (currently using v2 data)
  • Switch those min-case cfgs back to using v3 data and test again (or have v2 and v3 versions of those min-case tests)

@forsyth2
Copy link
Collaborator Author

@chengzhuzhang The relevant code changes are in zppy/templates/tc_analysis.bash. Could you please review those to make sure they match up with E3SM-Project/e3sm_diags#824 as intended?

I tested all the min-case cfg files on v2 data and everything was fine. However, I ran into some issues on v3. min_case_e3sm_diags_tc_analysis_parallel and min_case_e3sm_diags_tc_analysis_mvm_2 resulted in ERROR (1) for the tc_analysis job. ERROR (1) means the dat file is empty. grep -in error <.o file> doesn't return anything in either case. I'm not sure why this would only happen on some of the runs.

If you'd like to try this out yourself:

Testing directions
# Check out the branch
git fetch upstream
git checkout -b tc-analysis-updates upstream/tc-analysis-updates

# Set up e3sm diags dev environment
cd <e3sm_diags directory>
git checkout main
git fetch upstream
git reset --hard upstream/main
git log # Should match https://github.com/E3SM-Project/e3sm_diags/commits/main
mamba clean --all
mamba env create -f conda-env/dev.yml -n e3sm_diags_<date>
conda activate e3sm_diags_<date>
pip install .

# Set up zppy dev environment
cd <zppy directory>
mamba clean --all
mamba env create -f conda/dev.yml -n zppy_dev_pr632
conda activate zppy_dev_pr632
pip install .

# This block is only if you make further changes to the code
pre-commit run --all-files
pip install .
python -u -m unittest tests/test_*.py # Run unit tests

# Set up the testing files
emacs tests/integration/utils.py # set `UNIQUE_ID = "test-pr632"` and under `get_chrysalis_expansions`: `"diags_environment_commands": "source <path to conda.sh>; conda activate e3sm_diags_<date>",`
python tests/integration/utils.py # Actually generate the testing files

# Run zppy. These usually take 5-10 minutes to run in full.
zppy -c tests/integration/generated/test_min_case_e3sm_diags_tc_analysis_parallel_chrysalis.cfg
zppy -c tests/integration/generated/test_min_case_e3sm_diags_tc_analysis_mvm_1_chrysalis.cfg
# Wait for _1 to finish and then run:
zppy -c tests/integration/generated/test_min_case_e3sm_diags_tc_analysis_mvm_2_chrysalis.cfg

# Check results
cd /lcrc/group/e3sm/<username>/zppy_min_case_e3sm_diags_tc_analysis_parallel_output/test-pr632/v3.LR.historical_0051/post/scripts/
grep -v "OK" *status
cd /lcrc/group/e3sm/<username>/zppy_min_case_e3sm_diags_tc_analysis_mvm_2_output/test-pr632/v3.LR.historical_0051/post/scripts/
grep -v "OK" *status

@forsyth2
Copy link
Collaborator Author

The relevant code changes are in zppy/templates/tc_analysis.bash. Could you please review those to make sure they match up with E3SM-Project/e3sm_diags#824 as intended?

@chengzhuzhang Can you please review the changes in zppy/templates/tc_analysis.bash? If you approve those, I can merge this PR.

I tested all the min-case cfg files on v2 data and everything was fine. However, I ran into some issues on v3. min_case_e3sm_diags_tc_analysis_parallel and min_case_e3sm_diags_tc_analysis_mvm_2 resulted in ERROR (1) for the tc_analysis job. ERROR (1) means the dat file is empty. grep -in error <.o file> doesn't return anything in either case. I'm not sure why this would only happen on some of the runs.

I tested on v3 data, with this PR's code rebased on main (see details below). It looks like the change at https://github.com/E3SM-Project/zppy/pull/633/files#r1821703792 enables the code to move on past that error. The error is now the error E3SM Diags catches, but I believe that is what we want here, since we didn't change the input. So, I think this PR is ready for review.

Testing details
# Set up e3sm diags dev environment
cd ~/ez/e3sm_diags
git checkout main
git fetch upstream
git reset --hard upstream/main
git log
conda clean --all --y
conda env create -f conda-env/dev.yml -n e3sm_diags_main_20241210
conda activate e3sm_diags_main_20241210
pip install .

# Set up zppy dev environment
cd ~/ez/zppy
conda clean --all --y
conda env create -f conda/dev.yml -n zppy_dev_pr632_20241210
conda activate zppy_dev_pr632
pip install .

pytest tests/test_*.py # 23 passed, 1 warning in 0.40s

# Set up the testing files
emacs tests/integration/utils.py 
# set `UNIQUE_ID = "test-pr632-20241210"`
# under `get_chrysalis_expansions`: `"diags_environment_commands": "source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate e3sm_diags_main_20241210",`

python tests/integration/utils.py # Actually generate the testing files
# CFG FILES HAVE BEEN GENERATED FROM TEMPLATES WITH THESE SETTINGS:
# UNIQUE_ID=test-pr632-20241210
# unified_testing=False
# diags_environment_commands=source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate e3sm_diags_main_20241210
# global_time_series_environment_commands=source <INSERT PATH TO CONDA>/conda.sh; conda activate <INSERT ENV NAME>
# e3sm_to_cmip_environment_commands=
# environment_commands=


# tests/integration/template_min_case_e3sm_diags_tc_analysis.cfg
# tests/integration/template_min_case_e3sm_diags_tc_analysis_mvm_1.cfg
# tests/integration/template_min_case_e3sm_diags_tc_analysis_mvm_2.cfg
# tests/integration/template_min_case_e3sm_diags_tc_analysis_parallel.cfg

# tests/integration/template_min_case_e3sm_diags_tc_analysis_v2.cfg
# tests/integration/template_min_case_e3sm_diags_tc_analysis_v2_mvm_1.cfg
# tests/integration/template_min_case_e3sm_diags_tc_analysis_v2_mvm_2.cfg
# tests/integration/template_min_case_e3sm_diags_tc_analysis_v2_parallel.cfg


# Run zppy. These usually take 5-10 minutes to run in full.
zppy -c tests/integration/generated/test_min_case_e3sm_diags_tc_analysis_parallel_chrysalis.cfg
zppy -c tests/integration/generated/test_min_case_e3sm_diags_tc_analysis_mvm_1_chrysalis.cfg
# Wait for _1 to finish and then run:
zppy -c tests/integration/generated/test_min_case_e3sm_diags_tc_analysis_mvm_2_chrysalis.cfg

# Check results
cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_tc_analysis_parallel_output/test-pr632-20241210/v3.LR.historical_0051/post/scripts/
grep -v "OK" *status
cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_tc_analysis_mvm_2_output/test-pr632-20241210/v3.LR.historical_0051/post/scripts/
grep -v "OK" *status
Error details

No viewers were created:

From e3sm_diags_atm_monthly_180x360_aave_mvm_model_vs_model_1995-1996_vs_1985-1986.o643604:

2024-12-10 12:29:36,739 [ERROR]: core_parameter.py(_run_diag:343) >> Error in e3sm_diags.driver.tc_analysis_driver
Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/e3sm_diags_main_20241210/lib/python3.12/site-packages/e3sm_diags/parameter/core_parameter.py", line 340, in _run_dia\
g
    single_result = module.run_diag(self)
                    ^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/e3sm_diags_main_20241210/lib/python3.12/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 89, in run_diag
    test_data["metrics"] = generate_tc_metrics_from_te_stitch_file(test_te_file)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/e3sm_diags_main_20241210/lib/python3.12/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 178, in generat\
e_tc_metrics_from_te_stitch_file
    raise ValueError(f"The file {te_stitch_file} is empty.")
ValueError: The file /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_tc_analysis_mvm_2_output/test-pr632-20241210/v3.LR.historical_0051/post/atm/tc-analysis_1995\
_1996/cyclones_stitch_v3.LR.historical_0051_1995_1996.dat is empty.
2024-12-10 12:29:36,763 [WARNING]: e3sm_diags_driver.py(main:378) >> There was not a single valid diagnostics run, no viewer created.
2024-12-10 12:29:36,763 [ERROR]: run.py(run_diags:91) >> Error traceback:
Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/e3sm_diags_main_20241210/lib/python3.12/site-packages/e3sm_diags/run.py", line 89, in run_diags
    params_results = main(params)
                     ^^^^^^^^^^^^
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/e3sm_diags_main_20241210/lib/python3.12/site-packages/e3sm_diags/e3sm_diags_driver.py", line 397, in main
    if parameters_results[0].fail_on_incomplete and (
       ~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

We can see this IndexError is actually ultimately caused by the ValueError from E3SM-Project/e3sm_diags@ffbdf28#diff-d256b358c8e18c01cd1de1ac39472fce1ba7c86f66a530ae321e5b8e33526eebR169.

@forsyth2 forsyth2 marked this pull request as ready for review December 10, 2024 18:46
@forsyth2 forsyth2 force-pushed the tc-analysis-updates branch from d7fcbc7 to 5980d80 Compare December 10, 2024 18:49
y1={{ year1 }}
y2={{ year2 }}
Y1="{{ '%04d' % (year1) }}"
Y2="{{ '%04d' % (year2) }}"
result_dir={{ scratch }}/tc-analysis_${Y1}_${Y2}/
result_dir={{ output }}/post/atm/tc-analysis_${Y1}_${Y2}/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2 the changes look good to me. With the fix in mesh generation, I think we no longer need extra step to save file in scratch? but it looks like all tc_analysis tests include scratch.

Copy link
Collaborator Author

@forsyth2 forsyth2 Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. I just removed the scratch parameters from the test cfgs in the latest commit.

@forsyth2 forsyth2 merged commit ea0a84d into main Dec 10, 2024
4 checks passed
@forsyth2 forsyth2 deleted the tc-analysis-updates branch December 10, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants