Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: [It] should update TFJob with desired status #1820

Closed
tenzen-y opened this issue May 31, 2023 · 16 comments · Fixed by #1846
Closed

Flaky test: [It] should update TFJob with desired status #1820

tenzen-y opened this issue May 31, 2023 · 16 comments · Fixed by #1846
Assignees

Comments

@tenzen-y
Copy link
Member

tenzen-y commented May 31, 2023

Chief worker is succeeded

• [FAILED] [0.026 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76

  Timeline >>
  2023-05-31T14:16:30Z	INFO	testing case	{"description": "Chief worker is succeeded"}
  2023-05-31T14:16:30Z	DEBUG	events	TFJob default/test-tfjob has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"}
  2023-05-31T14:16:30Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
  2023-05-31T14:16:30Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
  2023-05-31T14:16:30Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
  2023-05-31T14:16:30Z	DEBUG	events	Created service: test-status-0-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"a24bd44b-a9de-48dc-a982-e54353988325"}, "reason": "SuccessfulCreateService"}
  2023-05-31T14:16:30Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": "default/test-status-0", "unable to fetch TFJob": "default/test-status-0"}
  2023-05-31T14:16:30Z	DEBUG	events	Created service: test-status-0-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"a24bd44b-a9de-48dc-a982-e54353988325"}, "reason": "SuccessfulCreateService"}
  2023-05-31T14:16:30Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp"
  2023-05-31T14:16:30Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.Worker.template.metadata.creationTimestamp"
  2023-05-31T14:16:30Z	INFO	checking status	{"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-05-31T14:16:30Z"}}
  [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 05/31/23 14:16:30.732
  << Timeline

  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 05/31/23 14:16:30.732
------------------------------

https://github.com/kubeflow/training-operator/actions/runs/5133950363/jobs/9237255811#step:4:126

@tenzen-y
Copy link
Member Author

tenzen-y commented Jun 4, 2023

Similar flaky test: (No chief worker) Worker is running

------------------------------
• [FAILED] [0.098 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76
...
  Timeline >>
  2023-06-04T17:26:13Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": "default/test-status-4", "unable to fetch TFJob": "default/test-status-4"}
  2023-06-04T17:26:13Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": "default/test-status-4", "unable to fetch TFJob": "default/test-status-4"}
  2023-06-04T17:26:13Z	DEBUG	events	Created service: test-status-4-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"8d78a0a3-fb74-416b-902b-1cb0794616e1"}, "reason": "SuccessfulCreateService"}
  2023-06-04T17:26:13Z	DEBUG	events	TFJob default/test-status-4 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"8d78a0a3-fb74-416b-902b-1cb0794616e1"}, "reason": "TFJobSucceeded"}
  2023-06-04T17:26:13Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-06-04T17:26:13Z","lastTransitionTime":"2023-06-04T17:26:13Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-06-04T17:26:13Z","completionTime":"2023-06-04T17:26:13Z"}}
  2023-06-04T17:26:13Z	INFO	passed!	{"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
  2023-06-04T17:26:13Z	INFO	testing case	{"description": "(No chief worker) Worker is running"}
  2023-06-04T17:26:13Z	INFO	checking status	{"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{},"Worker":{}}}}
  [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 06/04/23 17:26:13.951
  << Timeline

  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 06/04/23 17:26:13.951

https://github.com/kubeflow/training-operator/actions/runs/5170338558/jobs/9313174311?pr=1824#step:4:406

@lowang-bh
Copy link
Member

@tenzen-y
Copy link
Member Author

@lowang-bh Thanks for reporting that. However, that case doesn't seem to be similar to this test.
So I created another issue.

#1838

@tenzen-y
Copy link
Member Author

tenzen-y commented Jul 3, 2023

Similar flaky test:

------------------------------
• [FAILED] [0.531 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76

  Timeline >>
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-tfjob has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-0 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-0 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"succeeded":1},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-0", "job description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-1 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-1", "job description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"failed":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-2", "job description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-3-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"failed":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-3", "job description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-4-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-4 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-5-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-5","uid":"9d893137-d9f5-4e03-9112-ffd9e07d3dce"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-5 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":1}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-5", "job description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5[365](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:366)"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.PS.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-6 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"succeeded":2}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-6", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-7 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"failed":2}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-7", "job description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":2,"failed":2}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-8", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-9-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-[404](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:405)0-9e35-08998ad227fc"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-9 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-9 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-9", "job description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-10-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-10 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-10", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-11-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-11 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-11 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-11", "job description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-12-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-12 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":2,"succeeded":1,"failed":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-12", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-[431](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:432)0-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-13 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{"active":2},"Worker":{"failed":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-13", "job description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":4}}}}
  [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:[458](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:459) @ 07/03/23 15:53:49.286
  << Timeline

  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 07/03/23 15:53:49.286
------------------------------

https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:345

@tenzen-y
Copy link
Member Author

tenzen-y commented Jul 3, 2023

/assign

@tenzen-y
Copy link
Member Author

tenzen-y commented Jul 3, 2023

I couldn't reproduce these failures on my local with the following command:

KUBEBUILDER_ASSETS="$(shell setup-envtest use $(ENVTEST_K8S_VERSION) -p path)" \
	./bin/ginkgo --until-it-fails -v ./pkg/controller.v1/tensorflow/...

I guess these tests take a bit of time. So we should increase the time until timeout for gomega.Eventually.
Because CI env has much-limited computing resources.

@tenzen-y
Copy link
Member Author

tenzen-y commented Jul 3, 2023

The default time is 1s until timeout for the gomega.Eventually.

@johnugeorge
Copy link
Member

That is a good point

@tenzen-y
Copy link
Member Author

tenzen-y commented Jul 3, 2023

That is a good point

I'm working on this improvement.

@tenzen-y
Copy link
Member Author

tenzen-y commented Jul 4, 2023

Only this is re-occurred:

/reopen

Similar flaky test:

------------------------------
• [FAILED] [0.531 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76

  Timeline >>
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-tfjob has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-0 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-0 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"succeeded":1},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-0", "job description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-1 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-1", "job description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"failed":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-2", "job description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-3-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"failed":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-3", "job description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-4-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-4 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-5-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-5","uid":"9d893137-d9f5-4e03-9112-ffd9e07d3dce"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-5 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":1}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-5", "job description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5[365](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:366)"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.PS.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-6 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"succeeded":2}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-6", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-7 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"failed":2}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-7", "job description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":2,"failed":2}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-8", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-9-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-[404](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:405)0-9e35-08998ad227fc"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-9 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-9 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-9", "job description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-10-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-10 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-10", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-11-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-11 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-11 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-11", "job description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-12-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-12 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":2,"succeeded":1,"failed":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-12", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-[431](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:432)0-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-13 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{"active":2},"Worker":{"failed":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-13", "job description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":4}}}}
  [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:[458](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:459) @ 07/03/23 15:53:49.286
  << Timeline

  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 07/03/23 15:53:49.286
------------------------------

https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:345

@google-oss-prow google-oss-prow bot reopened this Jul 4, 2023
@google-oss-prow
Copy link

@tenzen-y: Reopened this issue.

In response to this:

Only this is re-occurred:

https://github.com/kubeflow/training-operator/actions/runs/5451187547/jobs/9917173825#step:4:510

/reopen

Similar flaky test:

------------------------------
• [FAILED] [0.531 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76

  Timeline >>
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-tfjob has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-0-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-0 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-0" not found	{"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-0 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"succeeded":1},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-0", "job description": "Chief worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-1" not found	{"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-1-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-1 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-1", "job description": "Chief worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created pod: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreatePod"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-2" not found	{"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-2-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"failed":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-2", "job description": "Chief worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-3" not found	{"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-3-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"failed":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-3", "job description": "(No chief worker) Worker is failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-4" not found	{"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-4-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-4 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-5" not found	{"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-5-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-5","uid":"9d893137-d9f5-4e03-9112-ffd9e07d3dce"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-5 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":1}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-5", "job description": "(No chief worker) Worker is running"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5[365](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:366)"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-6-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-6" not found	{"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
  2023-07-03T15:53:48Z	INFO	KubeAPIWarningLogger	unknown field "spec.tfReplicaSpecs.PS.template.metadata.creationTimestamp"
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-6 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"succeeded":2}},"startTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-6", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-7" not found	{"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-7-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	DEBUG	events	TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:48Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-7 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"failed":2}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
  2023-07-03T15:53:48Z	INFO	passed!	{"job name": "test-status-7", "job description": "(No chief worker) 2 workers are running, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	testing case	{"description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:48Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:48Z	DEBUG	events	Created service: test-status-8-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-8" not found	{"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-8-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":2,"failed":2}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-8", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-9-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-[404](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:405)0-9e35-08998ad227fc"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-9" not found	{"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-9-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-9 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-9 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-9", "job description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-10-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-10" not found	{"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-10-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-10 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-10", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-11-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-11" not found	{"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-11-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-11 successfully completed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "TFJobSucceeded"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-11 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-11", "job description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Pod: default.test-status-12-worker-0 exited with code 0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "ExitedWithCode"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-12" not found	{"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-12-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-12 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":2,"succeeded":1,"failed":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-12", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-[431](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:432)0-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-2	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-worker-3	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-ps-1	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-13" not found	{"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
  2023-07-03T15:53:49Z	DEBUG	events	Created service: test-status-13-chief-0	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
  2023-07-03T15:53:49Z	DEBUG	events	TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.	{"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "TFJobFailed"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-13 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{"active":2},"Worker":{"failed":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
  2023-07-03T15:53:49Z	INFO	passed!	{"job name": "test-status-13", "job description": "Chief is running, workers are failed"}
  2023-07-03T15:53:49Z	INFO	testing case	{"description": "Chief is running, workers are succeeded"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	TFJob.kubeflow.org "test-status-14" not found	{"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
  2023-07-03T15:53:49Z	INFO	checking status	{"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":4}}}}
  [FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:[458](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:459) @ 07/03/23 15:53:49.286
  << Timeline

  [FAILED] Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 07/03/23 15:53:49.286
------------------------------

https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:345

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tenzen-y
Copy link
Member Author

tenzen-y commented Jul 4, 2023

It seems that otherwise are resolved.

@github-actions
Copy link

github-actions bot commented Oct 2, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member Author

tenzen-y commented Oct 2, 2023

/lifecycle frozen

@tenzen-y
Copy link
Member Author

/close

Copy link

@tenzen-y: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants