Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jetstream Autoscaling Guide #703

Merged
merged 43 commits into from
Jun 17, 2024
Merged

Jetstream Autoscaling Guide #703

merged 43 commits into from
Jun 17, 2024

Conversation

Bslabe123
Copy link
Collaborator

No description provided.

@liurupeng
Copy link
Collaborator

/hold please don't merge before I take a look, thanks

@liurupeng
Copy link
Collaborator

I feel we have added a bunch of files in order to use terraform to deploy the resources for HPA instead of deploying the pod monitoring and cmsa with yamls, what's the benefits going with this approach? @Bslabe123

@liurupeng
Copy link
Collaborator

/gcbrun

@Bslabe123
Copy link
Collaborator Author

/gcbrun

@Bslabe123 Bslabe123 merged commit c62d2ba into main Jun 17, 2024
9 checks passed
leroyjb pushed a commit to leroyjb/ai-on-gke that referenced this pull request Jan 24, 2025
* first commit

* missing files

* various improvements

* some autoscaling changes for testing

* add targetlabels to podmonitoring

* Revert repo pinning

* more reversions

* more reversions

* cleanup

* more cleanup

* Added to README

* revert topology change

* tweaks to deployment

* HPA terraform fixes

* remove stray comment

* Add more to README

* parameterize metrics scrape port

* Cleaned up readme

* readme tweak

* typo

* remove indentation

* newline

* More updates to readme

* change wording

* Update metrics scrape example

* remove annotation

* terraform format

* missing comma

* maxengine-server in terraform

* wording

* terraform fmt

* parameterize container images

* wording

* remove ksa var

* move deployment to kubectl directory

* App -> app

* pipe from maxengine module to main

* Update tutorials-and-examples/inference-servers/jetstream/maxtext/single-host-inference/README.md

Co-authored-by: RupengLiu <rupliu@google.com>

* remove TODO

* HPA can now scale with HBM

---------

Co-authored-by: RupengLiu <rupliu@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants