Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EAMxx: add multi-instance and nbfb testing #6984

Merged
merged 14 commits into from
Feb 21, 2025
Merged

Conversation

mahf708
Copy link
Contributor

@mahf708 mahf708 commented Feb 7, 2025

Adding support for multi-instance capability in EAMxx (a one-liner in mct coupling layer) and adding a customized/specialized MVK tests for climate/nbfb reproducibility in EAMxx. More details to come.

[BFB]

@mahf708 mahf708 added the EAMxx PRs focused on capabilities for EAMxx label Feb 7, 2025
@mahf708 mahf708 requested a review from mkstratos February 7, 2025 21:23
Copy link

github-actions bot commented Feb 7, 2025

PR Preview Action v1.6.0

🚀 View preview at
https://E3SM-Project.github.io/E3SM/pr-preview/pr-6984/

Built to branch gh-pages at 2025-02-17 20:49 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@mahf708 mahf708 requested a review from rljacob February 7, 2025 22:23
@mahf708
Copy link
Contributor Author

mahf708 commented Feb 7, 2025

To trigger:

generate baselines:

./cime/scripts/create_test MVKxx_Ly1.ne4pg2_ne4pg2.F2010-SCREAMv1.pm-cpu_intel.eamxx-mvkpert --generate --baseline-root=$PSCRATCH/e3sm-scratch/test-pr/baselines -b mkvxx_test -o

compare to above-generated baselines:

./cime/scripts/create_test MVKxx_Ly1.ne4pg2_ne4pg2.F2010-SCREAMv1.pm-cpu_intel.eamxx-mvkpert --compare --baseline-root=$PSCRATCH/e3sm-scratch/test-pr/baselines -b mkvxx_test -o

Sample of test output viewable on the web (but not sure if it is meaningful; seems like fields are constants? 🤷 tbd more validation): https://portal.nersc.gov/project/e3sm/mahf708/MVKxx_Ly1.ne4pg2_ne4pg2.F2010-SCREAMv1.pm-cpu_intel.eamxx-mvkpert.C.20250210_142343_isu2io.evv/

@bartgol bartgol requested a review from jgfouca February 10, 2025 16:46
@bartgol
Copy link
Contributor

bartgol commented Feb 10, 2025

Adding @jgfouca since he is Mr. CIME, so he may have thoughts on the SystemTests part.

Copy link
Member

@jgfouca jgfouca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! I think this is our first E3SM-only SystemTest. We should think about moving the other tests that are very E3SM-specific here.

Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jim should review this, but I have one thought.

I see lots of overlap with MVK. So two sub-thoughts:

  • Let's prune what's the same, and only keep the differences
  • Wild idea: would it make sense to not add MVKxx, and just hack our buildnml scripts, so that we read what's in user_nl_eamxx, and perform the corresponding atmchanges? It would be taylored for this test type. We could put a check "if it's MVK test, then parse user_nl_eamxx and do atmchanges"... EDIT: I now see that another big diff is that MVKxx calls ksxx, while MVK relies on this evv external library. If this is the easier thing to do, then so be it. But maybe @jgfouca can tell whether we can cast our problem into something that existing MVK can handle?

EDIT 2: I don't want to burden this PR with more work, adding a feature is already a lot. But looking ahead, maybe we could make the infrastructure more general. E.g., I expect that maybe other new components (like omega) may want a MVK-like test, but may fail to fit in the CIME's MVK approach. Maybe we should create an e3sm-owned MVK approach that delegates to the component's buildnml script the job of modifying the different instances inputs, as well as uses some module in the component's path to collect the output files. That is, remove assumptions in the main scripts about how inputs/outputs files are called/structured. Let the components handle that bit of info.

@mahf708
Copy link
Contributor Author

mahf708 commented Feb 10, 2025

I think Jim should review this, but I have one thought.

I see lots of overlap with MVK. So two sub-thoughts:

  • Let's prune what's the same, and only keep the differences
  • Wild idea: would it make sense to not add MVKxx, and just hack our buildnml scripts, so that we read what's in user_nl_eamxx, and perform the corresponding atmchanges? It would be taylored for this test type. We could put a check "if it's MVK test, then parse user_nl_eamxx and do atmchanges"... EDIT: I now see that another big diff is that MVKxx calls ksxx, while MVK relies on this evv external library. If this is the easier thing to do, then so be it. But maybe @jgfouca can tell whether we can cast our problem into something that existing MVK can handle?

EDIT 2: I don't want to burden this PR with more work, adding a feature is already a lot. But looking ahead, maybe we could make the infrastructure more general. E.g., I expect that maybe other new components (like omega) may want a MVK-like test, but may fail to fit in the CIME's MVK approach. Maybe we should create an e3sm-owned MVK approach that delegates to the component's buildnml script the job of modifying the different instances inputs, as well as uses some module in the component's path to collect the output files. That is, remove assumptions in the main scripts about how inputs/outputs files are called/structured. Let the components handle that bit of info.

I will explain the PR better before merging. The logic in this PR is copied in super large part of two earlier efforts (one is the vanilla default MVK that's designed for EAM and the other is a specialized case for MPASO, called MVKO).

The goal is to hand this off to the infra team and move all of this for better long-term maintainability to this repo (https://github.com/LIVVkit/evv4esm) where the stuff I am overriding with these python files is there. So the goal is to specialize those files there to first-class support EAMxx rather than hack something here. I got access to that repo on Friday, so I (or @mkstratos) could do that once we are happy with the results for EAMxx.

There's on minor issue I'd like to address before declaring this ready for review. I need to figure out how to sidestep the lack of conventional short-term archiving capability in EAMxx (a small part of this test relies on it). After that, I will need to run this test for 12+ months twice to see if it behaves as expected. If all goes well, I will clean up this PR into ~3 commits and rebase, and invite Jim, Luca, Rob, Mike, and Peter for a review. (Luca, I'd like you to help me think about the MCT coupling logic and it how it interacts with this multi-instance capability; the one-liner edit in MCT coupling layer does the job for this PR purposes, but maybe it is confusing and annoying to do it like that? If you want, we should start thinking about this in a more organizational aspect --- the multi-instance capability rely on touching the last file of record (in eam world, that is atm_in; in eamxx, it is scream_input.yaml). For eam, the last file of record gets handled automatically the multi-instance mechanics deep down, but in eamxx, it obviously doesn't and need some manual intervention)

@bartgol
Copy link
Contributor

bartgol commented Feb 10, 2025

Yeah, I'm perfectly fine doing things in-house to get the capability in. But I would really like a nice conversation in the infra team to avoid the proliferation of tests/scripts that are very similar to each other, and promote instead the asbtraction of key concepts, so that components are left in charge only of what specific to them.

As for the shortcoming of mention (the archiving), I am not familiar with that feature. I don't know what I have to do locally, to test what happens, as I have never used it (I am not a model user; not enough, at least). But if you want, we can check together and discuss what needs to change (in eamxx and/or CIME) in order to fix things.

@jgfouca
Copy link
Member

jgfouca commented Feb 10, 2025

Yeah, I think MVK is also E3SM-only for all intents. We could move both and encapsulate some of the duplicated code in this directory.

@rljacob rljacob requested a review from jasonb5 February 10, 2025 18:30
@mahf708 mahf708 force-pushed the mahf708/eamxx/mvk-testing branch from 7882633 to cfd3659 Compare February 10, 2025 23:31
@mahf708 mahf708 marked this pull request as ready for review February 10, 2025 23:31
@mahf708
Copy link
Contributor Author

mahf708 commented Feb 10, 2025

@rljacob @bartgol @jgfouca, this is now ready for integration. PTAL. Note that @PeterCaldwell and I are planning to take this for a spin in terms of validation. For now, this uses a small subset of variables (only four, see components/eamxx/cime_config/SystemTests/ksxx_vars.json) just to get things rolling and it hardcodes the number of ensemble members to 2 (2!) only, also to get things rolling. We intend to test and see what sort of settings are scientifically valid for climate reproducibility testing for EAMxx. More to come, but this is the first technical hurdle to just get things going. See #6984 (comment) for a sample run with output. And refer to comments above about planned activities in terms of moving code around.

@mkstratos: This is ready if you'd like to take a rigorous look. Note that we will easily move stuff to evv once we are ready, most of the eamxx-specific mods are relatively (so lots of duplicated code for now). Because evv gets released as part of e3sm-unified/cime-latest conda envs, I'd like to keep all this duplication here until we figure out a better alternative.

Anyway, I hope this will be my role in getting the software/technical end of this done, and I can go back to the validation aspects now :)

@mahf708 mahf708 added the Testing Anything related to unit/system tests label Feb 10, 2025
@mahf708 mahf708 requested a review from bartgol February 11, 2025 02:18
@rljacob rljacob changed the title EAMxx: multi-instance and nbfb testing EAMxx: add multi-instance and nbfb testing Feb 11, 2025
@ndkeen
Copy link
Contributor

ndkeen commented Feb 13, 2025

Might consider changes to the shell_commands. Not sure the goal, but probably don't want 72 vertical levels.
Start date? COSP? Compute tendencies? Budgets? No need for lambda setting (now default).

@mahf708
Copy link
Contributor Author

mahf708 commented Feb 17, 2025

@rljacob: I'd like this merged soon so that we can evaluate it in the near future. I messaged @mkstratos to review it, and he said he would. There'll be another iteration on the settings (variable names, number of instances, etc.) once the test is scientifically validated.

I'll be mostly out for the week (still ill), but hopefully I don't need to be around for it to be merged. Okay with you if @bartgol or @jgfouca integrate it or do you prefer we wait until I am back online fully?

@rljacob
Copy link
Member

rljacob commented Feb 17, 2025

I'm ok with merging.

@rljacob rljacob closed this Feb 17, 2025
@rljacob rljacob reopened this Feb 17, 2025
Copy link
Contributor

@mkstratos mkstratos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the duplicated EVV code, which we can (re)move later, this looks good to me.

return monthly_avgs


def load_mpas_climatology_ensemble(files, field_name, mask_value=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method isn't used below, but this file will go away once we update EVV

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will clean up once we reconcile the eam/eamxx files in evv

table_data = pd.DataFrame(details).T
uc_rejections = (table_data["K-S test p-val"] < args.alpha).sum()
_hdrs = [
"h",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pedantic, but the "h0" here originally refers to null hypothesis acceptance/rejection rather than h0 from the hist files (only changes the text of output tables on the generated webpage)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I did a blind h0->h because of the archive stuff. Will also fix once we reconcile.

@mahf708
Copy link
Contributor Author

mahf708 commented Feb 20, 2025

@bartgol and/or @jgfouca, could one of you integrate this now? I think it is ready; there will be clean-up activities for two aspects soon:

  • all the py files here will be moved to evv and once we update that, we can rely on native support for eamxx --- this will take care of @mkstratos's comments above
  • the specific settings in the testmod (which can also be moved to the test itself) are tbd (number of instances, specific settings to test, etc.) --- this will take care of @ndkeen's comments above

@jgfouca
Copy link
Member

jgfouca commented Feb 20, 2025

@mahf708 , yes, I can start integrating.

@bartgol
Copy link
Contributor

bartgol commented Feb 20, 2025

Ok. Just quickly re-running failed v1 checks, to be sure. Should just be a matter of old baselines.

jgfouca added a commit that referenced this pull request Feb 20, 2025
Adding support for multi-instance capability in EAMxx (a one-liner in
mct coupling layer) and adding a customized/specialized MVK tests for
climate/nbfb reproducibility in EAMxx. More details to come.

[BFB]
@jgfouca
Copy link
Member

jgfouca commented Feb 20, 2025

Merged to next.

@jgfouca jgfouca merged commit fdf19f2 into master Feb 21, 2025
40 of 56 checks passed
@jgfouca jgfouca deleted the mahf708/eamxx/mvk-testing branch February 21, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EAMxx PRs focused on capabilities for EAMxx Testing Anything related to unit/system tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants