-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: Allow for optional TimeSlicing configuration #1018
base: main
Are you sure you want to change the base?
bugfix: Allow for optional TimeSlicing configuration #1018
Conversation
1c6e3fe
to
e6ec8d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @danielkleinstein.
I have made some suggestions for minor cleanups. I understand that the way this was done probably mirrors the same checks done for MPS, so I'm also ok to merge this as-is and then drop the redundant == nil
checks in a follow-up (or as a second commit on top of the one here).
@elezar Thanks for the quick review 🙂 I removed the unnecessary nil checks you commented on in a separate commit. |
f0505c9
to
d1bb887
Compare
@danielkleinstein the email address on your second commit is different to the first. Please update. |
db0e5d8
to
1e8822c
Compare
@elezar Fixed |
eadb4e2
to
2c80a56
Compare
Head branch was pushed to by a user without write access
2c80a56
to
75eb65b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run make check
and / or make test
locally if possible:
diff --git a/internal/lm/mig-strategy_test.go b/internal/lm/mig-strategy_test.go
index 005fc961..5aa7b9f6 100644
--- a/internal/lm/mig-strategy_test.go
+++ b/internal/lm/mig-strategy_test.go
@@ -30,7 +30,7 @@ func TestMigStrategyNoneLabels(t *testing.T) {
testCases := []struct {
description string
devices []resource.Device
- timeSlicing spec.ReplicatedResources
+ timeSlicing *spec.ReplicatedResources
expectedError bool
expectedLabels Labels
}{
diff --git a/internal/lm/resource_test.go b/internal/lm/resource_test.go
index 3f180188..3f44763c 100644
--- a/internal/lm/resource_test.go
+++ b/internal/lm/resource_test.go
@@ -271,7 +271,7 @@ func TestMigResourceLabeler(t *testing.T) {
description string
resourceName spec.ResourceName
count int
- timeSlicing spec.ReplicatedResources
+ timeSlicing *spec.ReplicatedResources
expectedLabels Labels
}{
{
@@ -301,7 +301,7 @@ func TestMigResourceLabeler(t *testing.T) {
description: "shared appends suffix and doubles count",
resourceName: "nvidia.com/gpu",
count: 1,
- timeSlicing: spec.ReplicatedResources{
+ timeSlicing: &spec.ReplicatedResources{
Resources: []spec.ReplicatedResource{
{
Name: "nvidia.com/gpu",
@@ -329,7 +329,7 @@ func TestMigResourceLabeler(t *testing.T) {
description: "renamed does not append suffix and doubles count",
resourceName: "nvidia.com/gpu",
count: 1,
- timeSlicing: spec.ReplicatedResources{
+ timeSlicing: &spec.ReplicatedResources{
Resources: []spec.ReplicatedResource{
{
Name: "nvidia.com/gpu",
@@ -358,7 +358,7 @@ func TestMigResourceLabeler(t *testing.T) {
description: "mig mixed appends shared",
resourceName: "nvidia.com/mig-1g.1gb",
count: 1,
- timeSlicing: spec.ReplicatedResources{
+ timeSlicing: &spec.ReplicatedResources{
Resources: []spec.ReplicatedResource{
{
Name: "nvidia.com/gpu",
@@ -391,7 +391,7 @@ func TestMigResourceLabeler(t *testing.T) {
description: "mig mixed rename does not append",
resourceName: "nvidia.com/mig-1g.1gb",
count: 1,
- timeSlicing: spec.ReplicatedResources{
+ timeSlicing: &spec.ReplicatedResources{
Resources: []spec.ReplicatedResource{
{
Name: "nvidia.com/mig-1g.1gb",
@elezar Sorry, was kind of hoping to use the project's CI to run the tests 👻 I was working from a temporary Github Codespaces and |
We require an opt-in to run tests. (Still working on improving the configurations here).
|
Head branch was pushed to by a user without write access
39063d1
to
eb95c30
Compare
This fixes a vestigial bug from the introduction of MPS. Now that two timeslicing configurations are available, it's possible for MPS to be configured instead of timeslicing. But since the TimeSlicing field was made non-optional - its existence is forced when unmarshalling, and then the parsing fails because no resources are specified under the empty TimeSlicing field. Signed-off-by: Daniel Kleinstein <daniel.kleinstein@gmail.com>
Signed-off-by: Daniel Kleinstein <daniel.kleinstein@gmail.com>
…y is defined Signed-off-by: Daniel Kleinstein <daniel@scaleops.com>
eb95c30
to
72b9e0b
Compare
@elezar I need some guidance - in a separate commit, I changed func (rl resourceLabeler) sharingDisabled() bool {
return rl.sharing == nil
} To: func (rl resourceLabeler) sharingDisabled() bool {
return rl.sharing == nil || (rl.sharing.SharingStrategy() == spec.SharingStrategyNone)
} Which was necessary to get func (rl resourceLabeler) getReplicas() int {
if rl.sharingDisabled() {
return 0
} else if r := rl.replicationInfo(); r != nil && r.Replicas > 0 {
return r.Replicas
}
return 1
} What's strange to me is that in the first test case, sharing seems to indeed be disabled (because An alternative fix for func (rl resourceLabeler) replicationInfo() *spec.ReplicatedResource {
if rl.sharingDisabled() {
return nil
}
for _, r := range rl.sharing.ReplicatedResources().Resources {
if r.Name == rl.resourceName {
return &r
}
}
return nil
} To: func (rl resourceLabeler) replicationInfo() *spec.ReplicatedResource {
if rl.sharingDisabled() {
return nil
}
rr := rl.sharing.ReplicatedResources()
if rr == nil {
return nil
}
for _, r := range rr.Resources {
if r.Name == rl.resourceName {
return &r
}
}
return nil
} But I'm not sure if this is a desirable change. |
In the existing implementation, the What about removing the last commit that attempts to address this and making the following change:
This ensures that |
This fixes a vestigial bug from the introduction of MPS. Now that two timeslicing configurations are available, it's possible for MPS to be configured instead of timeslicing.
But since the TimeSlicing field was made non-optional - its existence is forced when unmarshalling, and then the parsing fails because no resources are specified under the empty TimeSlicing field.