-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add configurable QPS and burst settings for kube API client #2411
base: release-1.9
Are you sure you want to change the base?
Add configurable QPS and burst settings for kube API client #2411
Conversation
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* Add Changelog for Training Operator v1.9.0-rc.0 Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Group PR for new features Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
…w#2379) Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
* Add MNIST example with SPMD for JAX Illustrate how to use JAX's `pmap` to express and execute single-program multiple-data (SPMD) programs for data parallelism along a batch dimension Signed-off-by: Sandipan Panda <samparksandipan@gmail.com> * Update CONTRIBUTING.md Use -- server-side to install the latest local changes of Training Operator control plane Signed-off-by: Sandipan Panda <samparksandipan@gmail.com> * Add JAXJob output Signed-off-by: Sandipan Panda <samparksandipan@gmail.com> * Update JAXJob CI images Signed-off-by: Sandipan Panda <samparksandipan@gmail.com> * Adjust jaxjob spmd example batch size Signed-off-by: Sandipan Panda <samparksandipan@gmail.com> * Add JAX Example Docker Image Build in CI Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> * Fix script name typo Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> * Update script permissions Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> * Add KIND_CLUSTER env var Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> * Increase timeouts Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> * Test higher resources Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> * Increase Timeout Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> * remove resource reqs Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> * test low batch size Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> * test small batch size Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> * Hardcode number of batches Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> --------- Signed-off-by: Sandipan Panda <samparksandipan@gmail.com> Signed-off-by: sailesh duddupudi <saileshradar@gmail.com> Co-authored-by: Sandipan Panda <samparksandipan@gmail.com>
…lizers (kubeflow#2323) * KEP-2170: Add unit and integration tests for model and dataset initializers Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com> * refactor tests Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com> --------- Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com>
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.30.0 to 0.33.0. - [Commits](golang/net@v0.30.0...v0.33.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: ChristianZaccaria <christian.zaccaria.cz@gmail.com>
* KEP-2170: Deploy JobSet in kubeflow-system namespace Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Remove namespace from base Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Remove label from namespace Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Create third-party dir for JobSet Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Bump JobSet to v0.7.3 Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Drop namespace from JobSet config Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Antonin Stefanutti <antonin@stefanutti.fr>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
25eb8f6
to
5bf1d1e
Compare
Introduce new flags to configure `QPS` and `Burst` for the Kubernetes API client, enabling better control over API rate limits. Signed-off-by: R.K <ron.kahn@run.ai>
8a8ce28
to
a5c93da
Compare
Pull Request Test Coverage Report for Build 13135248279Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
@ronk21runai Could you rebase this PR top on the release-1.9 branch? We has already been removed v1 codes from the master branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly lgtm
Thank you
@ronk21runai Once you rebase and address my comment, we can contain this in the release-1.9.
cfg.QPS = float32(clientQps) | ||
cfg.Burst = clientBurst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cfg.QPS = float32(clientQps) | |
cfg.Burst = clientBurst | |
cfg.RateLimiter = flowcontrol.NewTokenBucketRateLimiter(float32(clientQps), clientBurst) |
Due to controller-runtime specification, IIUC, we need to specify those parameters throughout the RateLimiter.
Thank you for this great contribution! |
Ideally, we want to support those and manager specific parameters in the Config API for v2 |
I agree, I will create an issue to add Config API support into Kubeflow Trainer V2. |
What this PR does / why we need it:
Currently, the default configuration of QPS (20) and Burst (30) is configured by the controller runtime defaults, which are not adjustable by the user. This PR allows users to fine-tune these values, improving the controller's performance.
Introduce new flags to configure QPS and Burst for the Kubernetes API client, enabling better control over API rate limits.
*Proposed Changes
This PR introduces two new argument flags:
These flags allow users to configure API rate limits dynamically instead of relying on the default values.
Checklist: