Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples/Documentation on deploying function with GPU resource? #2099

Open
safurrier opened this issue Feb 12, 2021 · 3 comments
Open

Examples/Documentation on deploying function with GPU resource? #2099

safurrier opened this issue Feb 12, 2021 · 3 comments

Comments

@safurrier
Copy link

I haven't been able to find much documentation or examples on enabling GPU usage on functions.

I'm trying to deploy a function that requires a GPU. On the k8s cluster only some specific nodes have GPUs on them. If I try to specify in the function config to use a GPU by setting it in the resource limits like this:

  resources:
    limits:
      nvidia.com/gpu: 1

I get an error on deployment for insufficient resources:

0/X nodes are available: X Insufficient nvidia.com/gpu.

I tried adding a nodeSelector to the function config similar to how you would in a deployment manifest:

spec:
  nodeSelector:
    accelerator: nvidia

But that didn't work. I can spin up a pod normally using the steps above (specifying GPU limit in resource and applying node selector to use the GPU enabled node) and can access the GPU just fine.

For reference I'm deploying via nuctl with a command like this:

nuctl deploy my-gpu-function \
		--kubeconfig ~/.kube/config \
		-f functions/my-gpu-function.yaml \
		-p functions/my-gpu-function.py \ 
		--runtime python:3.6

Is there something I'm missing? Also will it require building a custom image for use with GPU using a CUDA enabled base image or will Nuclio know if a GPU is enabled to use a GPU compatible image?

@jahaniam
Copy link

It is up to you to make sure your image has cuda.
adding nvidia.com/gpu: 1 will only add --gpus=all to the docker run.

@jahaniam
Copy link

I have run the gpu functions on local machine if that would help:
cvat-ai/cvat#2714

I needed to add --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' and also limit the gpu memory use inside of my function to make it work robustly for tensorflow. otherwise sometimes cudnn was failing to initialize.

@jahaniam
Copy link

see also here:
#1788

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants