-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge kubeflow/common to training-operator #1813
Conversation
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Syulin7 <735122171@qq.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnugeorge The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/cc @alculquicondor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Followups: We need to clean up redundantly and duplicate codes in separate PRs.
/lgtm
I don't have bandwidth to review this, but the idea sgtm! |
/cc @gaocegege |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @johnugeorge, I left few comments
type SchedulingPolicy struct { | ||
MinAvailable *int32 `json:"minAvailable,omitempty"` | ||
Queue string `json:"queue,omitempty"` | ||
MinResources *map[v1.ResourceName]resource.Quantity `json:"minResources,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tenzen-y @johnugeorge Why we can't just re-use Kubernetes Types ? e.g.
MinResources corev1.ResourceList
// initializeReplicaStatuses initializes the ReplicaStatuses for replica. | ||
func initializeReplicaStatuses(jobStatus *apiv1.JobStatus, rtype apiv1.ReplicaType) { | ||
core.InitializeReplicaStatuses(jobStatus, rtype) | ||
} | ||
|
||
// updateJobReplicaStatuses updates the JobReplicaStatuses according to the pod. | ||
func updateJobReplicaStatuses(jobStatus *apiv1.JobStatus, rtype apiv1.ReplicaType, pod *corev1.Pod) { | ||
core.UpdateJobReplicaStatuses(jobStatus, rtype, pod) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using this somewhere ?
I can see that each Controller has its own func: https://github.com/kubeflow/training-operator/blob/master/pkg/controller.v1/tensorflow/util.go#L110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
type PriorityClassGetFunc func(string) (*schedulingv1.PriorityClass, error) | ||
|
||
func CalcPGMinResources(minMember int32, replicas map[apiv1.ReplicaType]*apiv1.ReplicaSpec, pcGetFunc PriorityClassGetFunc) *v1.ResourceList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find where we use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
type FillPodGroupSpecFunc func(object metav1.Object) error | ||
|
||
func (jc *JobController) SyncPodGroup(job metav1.Object, specFunc FillPodGroupSpecFunc) (metav1.Object, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, in the following PRs we should identify part of common repo that we are not using today.
For example, I was not able to find where we use this scheduling plugin.
@@ -0,0 +1,29 @@ | |||
## Test Job Controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this test Job Controller ?
We can take cleanups in a separate PR? This PR can be restricted to merge so as track @andreyvelich |
Sure, sounds good. LGTM. It would be nice if we could create an issue to track those clean-ups |
Agree. |
Created: #1816 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @johnugeorge!
/lgtm
@johnugeorge Feel free to do |
Thanks @tenzen-y /hold cancel |
This PR tracks kubeflow/common to training operator repo. More optimizations will be tracked in a separate PR.
Related: #1714