Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump version to v0.15.0-rc.2 #596

Merged
merged 3 commits into from
Mar 16, 2024
Merged

Conversation

elezar
Copy link
Member

@elezar elezar commented Mar 15, 2024

No description provided.

@elezar elezar self-assigned this Mar 15, 2024
- Bump CUDA base image version to 12.3.2
- Add `cdi-cri` device list strategy. This uses the CDIDevices CRI field to request CDI devices instead of annotations.
- Set MPS memory limit by device index and not device UUID. This is a workaround for an issue where
these limits are not applied for devices if set by UUID.
- Update MPS sharing to disallow requests for multiple devices if MPS sharing is configured.
- Enforce replica limits for MPS sharing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really all that changed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❯  git log --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit -90 |grep -v Merge |grep -v Bump
11c7131e - Enforce maximum MPS replicas (25 hours ago) <Evan Lezar>
b3218699 - Run tail -f for each MPS daemon to output logs (2 days ago) <Evan Lezar>
93f161cc - Add mig-strategy flag to mps-control-daemon (2 days ago) <Evan Lezar>
4f3e4a58 - Cleanup log dir on stop (2 days ago) <Evan Lezar>
95be0832 - Explicitly set sharing.mps.failRequestsGreaterThanOne = true (2 days ago) <Evan Lezar>
13cf3b4c - Change validation logic for MPS sharing (2 days ago) <Evan Lezar>
d5f33b7d - Factor out allocate request validation (2 days ago) <Evan Lezar>
5edd66f6 - Set mps device memory limit by index (2 days ago) <Evan Lezar>
7302a18e - (origin/badge, badge) Add Status badges (8 days ago) <Carlos Eduardo Arango Gutierrez>
f3586af5 - (upstream/e2eactions, e2eactions) Add e2e github action (8 days ago) <Carlos Eduardo Arango Gutierrez>
00f34100 - Add dependabot config to update actions for gh-pages (9 days ago) <Evan Lezar>
6eb8d576 - Fix GitHub staging registry (9 days ago) <Evan Lezar>
dc38950a - Add cdi-cri device list strategy (11 days ago) <Evan Lezar>
cfcdcceb - Refactor label output (2 weeks ago) <Evan Lezar>
ca84c1b3 - Replace k8s-client.go with client sets (2 weeks ago) <Evan Lezar>
0727338f - Move use-node-feature-api to config structs (2 weeks ago) <Evan Lezar>
362d1d0f - Update nvidia-container-toolkit instructions (2 weeks ago) <Evan Lezar>
93b39b67 - (upstream/clean-go-mod) clean up replace directives in go.mod (2 weeks ago) <Tariq Ibrahim>
516945ce - TOFIX: Allow go mod tidy to set go version (2 weeks ago) <Evan Lezar>
2dbf357c - Add vendor check to actions (2 weeks ago) <Evan Lezar>
9e9fb58a - Extract GOLANG_VERSION from versions.mk (2 weeks ago) <Evan Lezar>
9cfdf86a - (origin/operations-per-run, operations-per-run) Increase operations-per-run on stale action (2 weeks ago) <Carlos Eduardo Arango Gutierrez>
ffb4b015 - Update github.com/mittwald/go-helm-client to v0.12.8 (2 weeks ago) <Evan Lezar>
86d94ffd - (origin/lifecyce, lifecyce) Edit stale message (2 weeks ago) <Carlos Eduardo Arango Gutierrez>
bc818aac - Use github image as staging image (3 weeks ago) <Evan Lezar>
4ac55b89 - Remove deprecated extensions/v1beta1 static deployment (3 weeks ago) <Evan Lezar>
6e209be5 - Update k8s.io/kubernetes to v1.28.7 (3 weeks ago) <Evan Lezar>
637326e1 - Fix typo on label name at stale action def (3 weeks ago) <Carlos Eduardo Arango Gutierrez>
cb9f53a2 - (origin/go-version-file, go-version-file) Update golang gh-Action to use go-version-file (3 weeks ago) <Carlos Eduardo Arango Gutierrez>
cb3949df - (origin/ghaction-stale, ghaction-stale) Add actions/stale gh-action (3 weeks ago) <Carlos Eduardo Arango Gutierrez>
778f0740 - Add changelog for v0.15.0-rc.1 (3 weeks ago) <Evan Lezar>
05897874 - Add MPS sharing section to README (3 weeks ago) <Evan Lezar>

non merge / non dependabot commits since rc1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say these are the relevant ones to include then:

11c7131e - Enforce maximum MPS replicas (25 hours ago) <Evan Lezar>
b3218699 - Run tail -f for each MPS daemon to output logs (2 days ago) <Evan Lezar>
95be0832 - Explicitly set sharing.mps.failRequestsGreaterThanOne = true (2 days ago) <Evan Lezar>
5edd66f6 - Set mps device memory limit by index (2 days ago) <Evan Lezar>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I have updated.

As a side note. It would be good to generate this automatically if possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a way on the GitHub UI for it. I'll show you next week

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know about the release notes for releases, I mean to update the changelog.

But maybe the issue is that I try to keep it up to date while making the changes.

@elezar elezar force-pushed the bump-version-v0.15.0-rc.2 branch 2 times, most recently from a391799 to eddf39f Compare March 15, 2024 20:53
@elezar elezar requested a review from klueska March 15, 2024 20:57
elezar added 3 commits March 16, 2024 07:19
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The static deployments were using inconsistent versions. This change updates
them all to v0.15.0-rc.2.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the bump-version-v0.15.0-rc.2 branch from 53302eb to bcc2a47 Compare March 16, 2024 05:21
@elezar elezar requested a review from ArangoGutierrez March 16, 2024 05:22
@elezar elezar merged commit e58985a into NVIDIA:main Mar 16, 2024
6 checks passed
@elezar elezar deleted the bump-version-v0.15.0-rc.2 branch March 16, 2024 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants