Add multi-gpu support for cupy scheme #5007

mj-will · 2025-01-17T11:09:35Z

This PR adds support for using multiple GPUs via MPI with the cupy backend. It is a follow-on to #4952

Standard information about the request

This is a: new feature

This change affects: in theory, any code that uses cupy schemes but, since the cupy scheme is relatively new, I don't believe it is being used it production

This change changes: GPU support

This change:

does not have unit tests as I don't think we can test multi-GPU support in the CI.
follows style guidelines (See e.g. PEP8)
has been proposed using the contribution guidelines

This change will: N/A

Motivation

The current cupy scheme is limited to a single GPU but using multiple GPUs when they are available can further accelerate analyses.

Links to any issues or associated PRs

Follow on from #4952

Testing performed

I've tested this for the pre-merger work that was used when testing #4952

Additional notes

In the current version, if the user specifies a device number this takes priority over the MPI-based logic. We could consider swapping this so the value is ignored when using MPI.

I based this implementation on what is described in this blogpost: https://blog.hpc.qmul.ac.uk/strategies-multi-node-gpu/#mpi-process-for-each-gpu-pure-mpi-approach

The author of this pull request confirms they will adhere to the code of conduct

spxiwh · 2025-01-17T12:44:53Z

GitHub CoPilot suggested the following to allow the user to exclude GPUs if (for e.g.) someone else is using it heavily by using a env variable:

import os
import cupy
import logging
from .pool import use_mpi

logger = logging.getLogger('pycbc.scheme')

class CUPYScheme(Scheme):
    """Scheme for using CUPY.
    
    Supports using CUPY with MPI. If MPI is enabled, will use all available
    devices. The environment variable `CUDA_VISIBLE_DEVICES` can be used to
    restrict the devices used.
    
    Parameters
    ----------
    device_num : int, optional
        The device number to use. If not provided, will use the default, 0.
        Should not be provided when using MPI to parallelize across devices.
    """
    def __init__(self, device_num=None):
        import cupy # Fail now if cupy is not there.
        import cupy.cuda
        
        do_mpi, _, rank = use_mpi(require_mpi=False, log=False)
        
        if device_num is not None and do_mpi:
            logger.warning("MPI is enabled, but a device number was provided.")
        
        if device_num is None and do_mpi:
            device_num = self.assign_gpu(rank)
            logging.debug("MPI enabled, using CUDA device %s", device_num)
        
        self.device_num = device_num
        self.cuda_device = cupy.cuda.Device(self.device_num)
    
    def assign_gpu(self, rank):
        exclude_devices = os.getenv('CUDA_EXCLUDE_DEVICES', '').split(',')
        exclude_devices = [int(dev) for dev in exclude_devices if dev.isdigit()]
        available_devices = [i for i in range(cupy.cuda.runtime.getDeviceCount()) if i not in exclude_devices]
        
        if not available_devices:
            raise RuntimeError("No available GPUs found.")
        
        return available_devices[rank % len(available_devices)]

    def __enter__(self):
        super().__enter__()
        self.cuda_device.__enter__()
    
    def __exit__(self, *args):
        super().__exit__(*args)
        self.cuda_device.__exit__(*args)

mj-will · 2025-01-17T13:57:53Z

GitHub CoPilot suggested the following to allow the user to exclude GPUs if (for e.g.) someone else is using it heavily by using a env variable:

I might be missing something, but I think CoPilot is over-engineering this, possibly because it doesn't know how CUDA device IDs work in cupy (the cupy ID doesn't match the physical ID). I think what you're describing is already supported with the changes I added.

CUDA has an environment variable CUDA_VISIBLE_DEVICES that defines which CUDA devices, e.g. CUDA_VISIBLE_DEVICES=0,1,2,3 and I think this enables what you describe:

If CUDA_VISIBLE_DEVICES=0,1,2,3, cupy would see 4 devices with IDs 0-3.
If CUDA_VISIBLE_DEVICES=0,1,3, cupy would see 3 devices with IDs 0-2

So with the current code, one could use the second option to exclude e.g. physical device 2 and there is no difference to cupy.

Example

Here's an example that tries to put an array on the second GPU for different values of CUDA_VISIBLE_DEVICES. You can see how when I exclude physical device 0, the cupy ID for physical device 1 is now 0 and I get an error if I try to use 1.


$ nvidia-smi
Fri Jan 17 13:50:15 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40                     Off |   00000000:21:00.0 Off |                    0 |
| N/A   32C    P8             35W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L40                     Off |   00000000:81:00.0 Off |                    0 |
| N/A   33C    P8             36W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
$ echo $CUDA_VISIBLE_DEVICES
0,1
$ python -c "import cupy; import cupy.cuda; print(cupy.cuda.runtime.getDeviceCount()); cupy.cuda.Device(1).use(); a = cupy.array([1])"
2
$ export CUDA_VISIBLE_DEVICES=1
$ python -c "import cupy; import cupy.cuda; print(cupy.cuda.runtime.getDeviceCount()); cupy.cuda.Device(1).use(); a = cupy.array([1])"
1
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "cupy/cuda/device.pyx", line 185, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 191, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 398, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
$ python -c "import cupy; import cupy.cuda; print(cupy.cuda.runtime.getDeviceCount()); cupy.cuda.Device(0).use(); a = cupy.array([1])"

spxiwh · 2025-01-17T15:04:32Z

Okay, I think CoPilot is just reflecting my own limited knowledge here. LGTM!

add multi-gpu support for cupy scheme

50a85d8

mj-will requested review from spxiwh and GarethCabournDavies January 17, 2025 11:09

mj-will added 3 commits January 17, 2025 14:04

fix typo in exit method

178a571

re-add missing warning

29b3f77

using module logger instead of global logger

e060f37

spxiwh approved these changes Jan 17, 2025

View reviewed changes

spxiwh merged commit ebbafae into gwastro:master Jan 17, 2025
30 checks passed

mj-will deleted the add-cupy-multi-gpu-support branch January 21, 2025 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-gpu support for cupy scheme #5007

Add multi-gpu support for cupy scheme #5007

mj-will commented Jan 17, 2025 •

edited

Loading

spxiwh commented Jan 17, 2025

mj-will commented Jan 17, 2025 •

edited

Loading

spxiwh commented Jan 17, 2025

Add multi-gpu support for cupy scheme #5007

Add multi-gpu support for cupy scheme #5007

Conversation

mj-will commented Jan 17, 2025 • edited Loading

Standard information about the request

Motivation

Contents

Links to any issues or associated PRs

Testing performed

Additional notes

spxiwh commented Jan 17, 2025

mj-will commented Jan 17, 2025 • edited Loading

spxiwh commented Jan 17, 2025

mj-will commented Jan 17, 2025 •

edited

Loading

mj-will commented Jan 17, 2025 •

edited

Loading