Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error handling for GPU code path unimplemented features #193

Merged
merged 7 commits into from
Jan 20, 2025

Conversation

wdeconinck
Copy link
Collaborator

All following changes are required to more gracefully integrate ectrans GPU code paths in atlas via transi.

  • Workaround to avoid some unimplemented gpu-code path features:

    • SETUP_TRANS(LDSPSETUPONLY=.TRUE.)
    • SETUP_TRANS(LDGRIDONLY=.TRUE.)

    The workarounds consists in ignoring the "ONLY" setup and setting everything up with a low truncation or grid resolution. Not ideal but a sufficient temporary solution.
    This avoids special handling in downstream code like Atlas that would otherwise perform the same tricks.

  • Calling of ABORT_TRANS for other unimplemented gpu-code path features:

    • VORDIV_TO_UV
    • SETUP_TRANS(LDLL=.TRUE.)

    Now we don't get obscure errors or segfaults in lower-level code paths

  • Unsupported known gpu code paths called from within transi are now protected explicitly with error return codes rather than hard failures that cannot be recovered from. These error return codes should be removed as features are implemented of course:
    - trans_vordiv_to_UV
    - trans_dirtrans_adj
    - trans_invtrans_adj
    - trans_setup with llatlon=true

    This now means that atlas unit-tests can choose to ignore unimplemented features via exception handling

Copy link
Collaborator

@samhatfield samhatfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending questions above.

Copy link
Collaborator

@lukasm91 lukasm91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with these changes. For the future, in order to fix those (implement LDSETUPONLZ and GRIDONLY), it might help if we could define what they actually mean. At least I always struggled with what it actually means, which variables have to be set, and which don't.

@wdeconinck
Copy link
Collaborator Author

Thanks @lukasm91 I promised I would make some tests for these setups but I got sidetracked mid-way. When I have time I will still create those tests, and perhaps add an option to disable the workarounds added in this PR so we can get an actual failure.

@wdeconinck wdeconinck force-pushed the feature/error_handling branch from 76eb4a2 to e0cfb0c Compare January 16, 2025 08:57
@wdeconinck
Copy link
Collaborator Author

Now rebased on develop

@wdeconinck wdeconinck merged commit a2fc6e7 into develop Jan 20, 2025
22 checks passed
@wdeconinck wdeconinck deleted the feature/error_handling branch January 20, 2025 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants