Add support for nvidia gpu limit #346

hongye-sun · 2018-11-21T00:54:42Z

This change is

qimingj · 2018-11-21T02:08:09Z

sdk/python/kfp/compiler/compiler.py

      template['container']['resources'] = {}
-    if op.memory_limit or op.cpu_limit:
+    if op.memory_limit or op.cpu_limit or op.nvidia_gpu_limit:


For each resource there is a "limit" and a "request". Do you also have the change for set_gpu_request()?

I have tried requests/gpu which doesn't work as expected. According to https://cloud.google.com/kubernetes-engine/docs/how-to/gpus, it only mentions the limit support. It looks like not supported.

Found this: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

"GPUs are only supposed to be specified in the limits section, which means:
You can specify GPU limits without specifying requests because Kubernetes will use the limit as the request value by default.
You can specify GPU in both limits and requests but these two values must be equal.
You cannot specify GPU requests without specifying limits."

The naming in the doc is misleading to me. limits sound like max, and requests min but actually it means both max and min here. This is different from cpu and memory.

Is it a good idea to stop the confusion at DSL side, say we just call it set_gpu()?

I feel it's better to just leave it open and let user to specify the values by add_resource_limit or add_resource_request. It will be easier for user to follow the instructions from gke document and less maintenance for us if things are changed in the future, e.g. gke start supports other brand gpus.

SG. Add the link to docstring then?

Ah. You removed the explicit gpu API. Then it is fine.

hmm... maybe it is also worthwhile to keep the set_gpu_limit()? a lot of python users (notebook users) they rely on intellisense. Having an explicit GPU setter helps them find it easily. Also I would prefer set_gpu* so auto complete can help here. What do you think of set_gpu_nvidia_limit()? Other suggestions? With only set_resource_limit and both cpu/memory setter, it is harder and confusing to users as to whether/how to set GPU.

qimingj · 2018-11-21T02:09:46Z

sdk/python/kfp/dsl/_container_op.py

@@ -155,6 +164,16 @@ def set_cpu_limit(self, cpu):
    self._validate_cpu_string(cpu)
    self.cpu_limit = cpu

+  def set_nvidia_gpu_limit(self, gpu):


Not sure how many types of gpus we will support but I am under the impression that nvidia is the mainstream one. Would it be better to do:

set_gpu_limit(self, num_gpu, vendor='nvidia')?

I hope we don't make assumption here. e.g. user might want to use amd one. The API should be extensible.

I would also hope that we just expose k8s resource limit directly. It's hard predict what resource user want to use in the future.

+1 for exposing resource limits and requests directly as generic key value pairs. We should keep the existing cpu and memory ones which are used frequently.

qimingj · 2018-11-22T01:31:46Z

/lgtm

qimingj · 2018-11-22T01:31:53Z

/approve

k8s-ci-robot · 2018-11-22T01:31:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qimingj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/OWNERS~~ [qimingj]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2018-11-22T01:31:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qimingj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/OWNERS~~ [qimingj]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

texasmichelle · 2018-11-24T23:11:04Z

Great work! Super happy to see this implemented so quickly. Thanks!!

kubeflow#346) * adding inital requirements checklist for kfserving to be integrated in kubeflow 1.0 * formatting

Add support for nvidia gpu limit

3c1feaa

k8s-ci-robot requested review from Ark-kun and gaoning777 November 21, 2018 00:54

k8s-ci-robot added the size/S label Nov 21, 2018

hongye-sun requested review from qimingj and removed request for Ark-kun and gaoning777 November 21, 2018 00:55

hongye-sun self-assigned this Nov 21, 2018

qimingj reviewed Nov 21, 2018

View reviewed changes

Ark-kun requested a review from IronPan November 21, 2018 02:15

Expose resource limits, requests and nodeSelector to ContainerOp

771a19d

k8s-ci-robot added size/M and removed size/S labels Nov 21, 2018

hongye-sun added 3 commits November 21, 2018 15:18

Merge from master.

622a4b7

Fix test data

3e85555

Add explicit set_gpu_limit function

1e70808

k8s-ci-robot added size/L and removed size/M labels Nov 22, 2018

Fix logical bug

c037990

k8s-ci-robot assigned qimingj Nov 22, 2018

k8s-ci-robot added the lgtm label Nov 22, 2018

k8s-ci-robot added the approved label Nov 22, 2018

k8s-ci-robot merged commit 486d43d into master Nov 22, 2018

IronPan deleted the hongyes/gpu branch July 10, 2019 18:34

HumairAK referenced this pull request in red-hat-data-services/data-science-pipelines Mar 11, 2024

Fix UI merge conflict that blocks the UI image build. (#346)

aab7793

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for nvidia gpu limit #346

Add support for nvidia gpu limit #346

hongye-sun commented Nov 21, 2018 •

edited by jlewi

Loading

qimingj Nov 21, 2018

hongye-sun Nov 21, 2018

qimingj Nov 21, 2018

hongye-sun Nov 21, 2018

qimingj Nov 21, 2018

qimingj Nov 21, 2018

qimingj Nov 21, 2018

hongye-sun Nov 22, 2018

qimingj Nov 21, 2018

IronPan Nov 21, 2018 •

edited

Loading

hongye-sun Nov 21, 2018

qimingj Nov 21, 2018

qimingj commented Nov 22, 2018

qimingj commented Nov 22, 2018

k8s-ci-robot commented Nov 22, 2018

k8s-ci-robot commented Nov 22, 2018

texasmichelle commented Nov 24, 2018

Add support for nvidia gpu limit #346

Add support for nvidia gpu limit #346

Conversation

hongye-sun commented Nov 21, 2018 • edited by jlewi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IronPan Nov 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qimingj commented Nov 22, 2018

qimingj commented Nov 22, 2018

k8s-ci-robot commented Nov 22, 2018

k8s-ci-robot commented Nov 22, 2018

texasmichelle commented Nov 24, 2018

hongye-sun commented Nov 21, 2018 •

edited by jlewi

Loading

IronPan Nov 21, 2018 •

edited

Loading