Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offload mods for crusher development #633

Merged
merged 1 commit into from
Jun 28, 2022
Merged

Offload mods for crusher development #633

merged 1 commit into from
Jun 28, 2022

Conversation

mewall
Copy link
Collaborator

@mewall mewall commented Jun 23, 2022

o Modified CMake scripts:
- Use BML_OMP_OFFLOAD for both NVIDIA and AMD,
needed due to commit #8a7df493
- Use FindCUDAToolkit module instead of depracated FindCUDA
- Update to CMake 3.17 version, to support FindCUDAToolkit
- Consolidated the logic for CUDA, HIP, and associated libraries
for various types of device builds under control of BML_USE_DEVICE
- Added BML_OFFLOAD_ARCH with options NVIDIA and AMD

o Change crusher build script to use new CMakeLists.txt and build.sh

o Modified offload regions to address bml_multiply_x2()
fortran test failure (hang)
- Move temporary working arrays all_ix, all_jx, and all_x from stack to heap
- This eliminated the hang, although it's not really clear why
- Similar changes made to other add, multiply offload regions

Copy link
Collaborator

@jeanlucf22 jeanlucf22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does any other build script need to be adapted?

CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
@mewall mewall force-pushed the offload_mods branch 5 times, most recently from dd9b44d to 7572bec Compare June 23, 2022 20:43
  o Modified CMake scripts:
    - Use BML_OMP_OFFLOAD for both NVIDIA and AMD,
      needed due to commit #8a7df493
    - Use FindCUDAToolkit module instead of depracated FindCUDA
    - Update to CMake 3.17 version, to support FindCUDAToolkit
    - Consolidated the logic for CUDA, HIP, and associated libraries
      for various types of device builds under control of BML_USE_DEVICE
    - Added BML_OFFLOAD_ARCH with options NVIDIA and AMD

  o Change crusher and spock build scripts accordingly

  o Modified offload regions to address bml_multiply_x2()
    fortran test failure (hang)
    - Move temporary working arrays all_ix, all_jx, and all_x from stack to heap
    - This eliminated the hang, although it's not really clear why
    - Similar changes made to other add, multiply offload regions
CMakeLists.txt Show resolved Hide resolved
@mewall
Copy link
Collaborator Author

mewall commented Jun 27, 2022 via email

@nicolasbock
Copy link
Collaborator

Thanks @mewall !

@nicolasbock nicolasbock merged commit b4d863a into master Jun 28, 2022
@nicolasbock nicolasbock deleted the offload_mods branch June 28, 2022 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants