Examples/Documentation on deploying function with GPU resource? #2099

safurrier · 2021-02-12T17:02:42Z

I haven't been able to find much documentation or examples on enabling GPU usage on functions.

I'm trying to deploy a function that requires a GPU. On the k8s cluster only some specific nodes have GPUs on them. If I try to specify in the function config to use a GPU by setting it in the resource limits like this:

  resources:
    limits:
      nvidia.com/gpu: 1

I get an error on deployment for insufficient resources:

0/X nodes are available: X Insufficient nvidia.com/gpu.

I tried adding a nodeSelector to the function config similar to how you would in a deployment manifest:

spec:
  nodeSelector:
    accelerator: nvidia

But that didn't work. I can spin up a pod normally using the steps above (specifying GPU limit in resource and applying node selector to use the GPU enabled node) and can access the GPU just fine.

For reference I'm deploying via nuctl with a command like this:

nuctl deploy my-gpu-function \
		--kubeconfig ~/.kube/config \
		-f functions/my-gpu-function.yaml \
		-p functions/my-gpu-function.py \ 
		--runtime python:3.6

Is there something I'm missing? Also will it require building a custom image for use with GPU using a CUDA enabled base image or will Nuclio know if a GPU is enabled to use a GPU compatible image?

The text was updated successfully, but these errors were encountered:

jahaniam · 2021-02-13T07:40:19Z

It is up to you to make sure your image has cuda.
adding nvidia.com/gpu: 1 will only add --gpus=all to the docker run.

jahaniam · 2021-02-13T07:45:16Z

I have run the gpu functions on local machine if that would help:
cvat-ai/cvat#2714

I needed to add --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' and also limit the gpu memory use inside of my function to make it work robustly for tensorflow. otherwise sometimes cudnn was failing to initialize.

jahaniam · 2021-02-13T07:47:33Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples/Documentation on deploying function with GPU resource? #2099

Examples/Documentation on deploying function with GPU resource? #2099

safurrier commented Feb 12, 2021

jahaniam commented Feb 13, 2021

jahaniam commented Feb 13, 2021

jahaniam commented Feb 13, 2021

Examples/Documentation on deploying function with GPU resource? #2099

Examples/Documentation on deploying function with GPU resource? #2099

Comments

safurrier commented Feb 12, 2021

jahaniam commented Feb 13, 2021

jahaniam commented Feb 13, 2021

jahaniam commented Feb 13, 2021