Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-gpu support for cupy scheme #5007

Merged
merged 4 commits into from
Jan 17, 2025

Conversation

mj-will
Copy link
Contributor

@mj-will mj-will commented Jan 17, 2025

This PR adds support for using multiple GPUs via MPI with the cupy backend. It is a follow-on to #4952

Standard information about the request

This is a: new feature

This change affects: in theory, any code that uses cupy schemes but, since the cupy scheme is relatively new, I don't believe it is being used it production

This change changes: GPU support

This change:

  • does not have unit tests as I don't think we can test multi-GPU support in the CI.
  • follows style guidelines (See e.g. PEP8)
  • has been proposed using the contribution guidelines

This change will: N/A

Motivation

The current cupy scheme is limited to a single GPU but using multiple GPUs when they are available can further accelerate analyses.

Contents

I've added logic to check if MPI is being used and then set the device number accordingly based on the total number of devices visible to cupy

Links to any issues or associated PRs

Follow on from #4952

Testing performed

I've tested this for the pre-merger work that was used when testing #4952

Additional notes

In the current version, if the user specifies a device number this takes priority over the MPI-based logic. We could consider swapping this so the value is ignored when using MPI.

I based this implementation on what is described in this blogpost: https://blog.hpc.qmul.ac.uk/strategies-multi-node-gpu/#mpi-process-for-each-gpu-pure-mpi-approach

  • The author of this pull request confirms they will adhere to the code of conduct

@spxiwh
Copy link
Contributor

spxiwh commented Jan 17, 2025

GitHub CoPilot suggested the following to allow the user to exclude GPUs if (for e.g.) someone else is using it heavily by using a env variable:

import os
import cupy
import logging
from .pool import use_mpi

logger = logging.getLogger('pycbc.scheme')

class CUPYScheme(Scheme):
    """Scheme for using CUPY.
    
    Supports using CUPY with MPI. If MPI is enabled, will use all available
    devices. The environment variable `CUDA_VISIBLE_DEVICES` can be used to
    restrict the devices used.
    
    Parameters
    ----------
    device_num : int, optional
        The device number to use. If not provided, will use the default, 0.
        Should not be provided when using MPI to parallelize across devices.
    """
    def __init__(self, device_num=None):
        import cupy # Fail now if cupy is not there.
        import cupy.cuda
        
        do_mpi, _, rank = use_mpi(require_mpi=False, log=False)
        
        if device_num is not None and do_mpi:
            logger.warning("MPI is enabled, but a device number was provided.")
        
        if device_num is None and do_mpi:
            device_num = self.assign_gpu(rank)
            logging.debug("MPI enabled, using CUDA device %s", device_num)
        
        self.device_num = device_num
        self.cuda_device = cupy.cuda.Device(self.device_num)
    
    def assign_gpu(self, rank):
        exclude_devices = os.getenv('CUDA_EXCLUDE_DEVICES', '').split(',')
        exclude_devices = [int(dev) for dev in exclude_devices if dev.isdigit()]
        available_devices = [i for i in range(cupy.cuda.runtime.getDeviceCount()) if i not in exclude_devices]
        
        if not available_devices:
            raise RuntimeError("No available GPUs found.")
        
        return available_devices[rank % len(available_devices)]

    def __enter__(self):
        super().__enter__()
        self.cuda_device.__enter__()
    
    def __exit__(self, *args):
        super().__exit__(*args)
        self.cuda_device.__exit__(*args)

@mj-will
Copy link
Contributor Author

mj-will commented Jan 17, 2025

GitHub CoPilot suggested the following to allow the user to exclude GPUs if (for e.g.) someone else is using it heavily by using a env variable:

I might be missing something, but I think CoPilot is over-engineering this, possibly because it doesn't know how CUDA device IDs work in cupy (the cupy ID doesn't match the physical ID). I think what you're describing is already supported with the changes I added.

CUDA has an environment variable CUDA_VISIBLE_DEVICES that defines which CUDA devices, e.g. CUDA_VISIBLE_DEVICES=0,1,2,3 and I think this enables what you describe:

  • If CUDA_VISIBLE_DEVICES=0,1,2,3, cupy would see 4 devices with IDs 0-3.
  • If CUDA_VISIBLE_DEVICES=0,1,3, cupy would see 3 devices with IDs 0-2

So with the current code, one could use the second option to exclude e.g. physical device 2 and there is no difference to cupy.

Example

Here's an example that tries to put an array on the second GPU for different values of CUDA_VISIBLE_DEVICES. You can see how when I exclude physical device 0, the cupy ID for physical device 1 is now 0 and I get an error if I try to use 1.


$ nvidia-smi
Fri Jan 17 13:50:15 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40                     Off |   00000000:21:00.0 Off |                    0 |
| N/A   32C    P8             35W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L40                     Off |   00000000:81:00.0 Off |                    0 |
| N/A   33C    P8             36W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
$ echo $CUDA_VISIBLE_DEVICES
0,1
$ python -c "import cupy; import cupy.cuda; print(cupy.cuda.runtime.getDeviceCount()); cupy.cuda.Device(1).use(); a = cupy.array([1])"
2
$ export CUDA_VISIBLE_DEVICES=1
$ python -c "import cupy; import cupy.cuda; print(cupy.cuda.runtime.getDeviceCount()); cupy.cuda.Device(1).use(); a = cupy.array([1])"
1
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "cupy/cuda/device.pyx", line 185, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 191, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 398, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
$ python -c "import cupy; import cupy.cuda; print(cupy.cuda.runtime.getDeviceCount()); cupy.cuda.Device(0).use(); a = cupy.array([1])"

@spxiwh
Copy link
Contributor

spxiwh commented Jan 17, 2025

Okay, I think CoPilot is just reflecting my own limited knowledge here. LGTM!

@spxiwh spxiwh merged commit ebbafae into gwastro:master Jan 17, 2025
30 checks passed
@mj-will mj-will deleted the add-cupy-multi-gpu-support branch January 21, 2025 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants