Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for nvidia gpu limit #346

Merged
merged 6 commits into from
Nov 22, 2018
Merged

Add support for nvidia gpu limit #346

merged 6 commits into from
Nov 22, 2018

Conversation

hongye-sun
Copy link
Contributor

@hongye-sun hongye-sun commented Nov 21, 2018

This change is Reviewable

@hongye-sun hongye-sun requested review from qimingj and removed request for Ark-kun and gaoning777 November 21, 2018 00:55
@hongye-sun hongye-sun self-assigned this Nov 21, 2018
template['container']['resources'] = {}
if op.memory_limit or op.cpu_limit:
if op.memory_limit or op.cpu_limit or op.nvidia_gpu_limit:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each resource there is a "limit" and a "request". Do you also have the change for set_gpu_request()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried requests/gpu which doesn't work as expected. According to https://cloud.google.com/kubernetes-engine/docs/how-to/gpus, it only mentions the limit support. It looks like not supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

"GPUs are only supposed to be specified in the limits section, which means:
You can specify GPU limits without specifying requests because Kubernetes will use the limit as the request value by default.
You can specify GPU in both limits and requests but these two values must be equal.
You cannot specify GPU requests without specifying limits."

The naming in the doc is misleading to me. limits sound like max, and requests min but actually it means both max and min here. This is different from cpu and memory.

Is it a good idea to stop the confusion at DSL side, say we just call it set_gpu()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's better to just leave it open and let user to specify the values by add_resource_limit or add_resource_request. It will be easier for user to follow the instructions from gke document and less maintenance for us if things are changed in the future, e.g. gke start supports other brand gpus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG. Add the link to docstring then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. You removed the explicit gpu API. Then it is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... maybe it is also worthwhile to keep the set_gpu_limit()? a lot of python users (notebook users) they rely on intellisense. Having an explicit GPU setter helps them find it easily. Also I would prefer set_gpu* so auto complete can help here. What do you think of set_gpu_nvidia_limit()? Other suggestions? With only set_resource_limit and both cpu/memory setter, it is harder and confusing to users as to whether/how to set GPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -155,6 +164,16 @@ def set_cpu_limit(self, cpu):
self._validate_cpu_string(cpu)
self.cpu_limit = cpu

def set_nvidia_gpu_limit(self, gpu):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how many types of gpus we will support but I am under the impression that nvidia is the mainstream one. Would it be better to do:

set_gpu_limit(self, num_gpu, vendor='nvidia')?

Copy link
Member

@IronPan IronPan Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we don't make assumption here. e.g. user might want to use amd one. The API should be extensible.

I would also hope that we just expose k8s resource limit directly. It's hard predict what resource user want to use in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for exposing resource limits and requests directly as generic key value pairs. We should keep the existing cpu and memory ones which are used frequently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@Ark-kun Ark-kun requested a review from IronPan November 21, 2018 02:15
@qimingj
Copy link
Contributor

qimingj commented Nov 22, 2018

/lgtm

@qimingj
Copy link
Contributor

qimingj commented Nov 22, 2018

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qimingj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qimingj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 486d43d into master Nov 22, 2018
@texasmichelle
Copy link
Member

Great work! Super happy to see this implemented so quickly. Thanks!!

@IronPan IronPan deleted the hongyes/gpu branch July 10, 2019 18:34
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
kubeflow#346)

* adding inital requirements checklist for kfserving to be integrated in kubeflow 1.0

* formatting
HumairAK referenced this pull request in red-hat-data-services/data-science-pipelines Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants