-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky test: [It] should update TFJob with desired status #1820
Comments
Similar flaky test: ------------------------------
• [FAILED] [0.098 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76
...
Timeline >>
2023-06-04T17:26:13Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": "default/test-status-4", "unable to fetch TFJob": "default/test-status-4"}
2023-06-04T17:26:13Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": "default/test-status-4", "unable to fetch TFJob": "default/test-status-4"}
2023-06-04T17:26:13Z DEBUG events Created service: test-status-4-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"8d78a0a3-fb74-416b-902b-1cb0794616e1"}, "reason": "SuccessfulCreateService"}
2023-06-04T17:26:13Z DEBUG events TFJob default/test-status-4 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"8d78a0a3-fb74-416b-902b-1cb0794616e1"}, "reason": "TFJobSucceeded"}
2023-06-04T17:26:13Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-06-04T17:26:13Z","lastTransitionTime":"2023-06-04T17:26:13Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-06-04T17:26:13Z","completionTime":"2023-06-04T17:26:13Z"}}
2023-06-04T17:26:13Z INFO passed! {"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
2023-06-04T17:26:13Z INFO testing case {"description": "(No chief worker) Worker is running"}
2023-06-04T17:26:13Z INFO checking status {"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{},"Worker":{}}}}
[FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 06/04/23 17:26:13.951
<< Timeline
[FAILED] Expected
<bool>: false
to be true
In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 06/04/23 17:26:13.951 |
should create missing Pods |
@lowang-bh Thanks for reporting that. However, that case doesn't seem to be similar to this test. |
Similar flaky test: ------------------------------
• [FAILED] [0.531 seconds]
TFJob controller Test Status [It] should update TFJob with desired status
/home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:76
Timeline >>
2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is succeeded"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-tfjob has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-tfjob"}, "reason": "TFJobFailed"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-0-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-0-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-0 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-0","uid":"1fd47494-bf58-437e-b51b-5f36baa673da"}, "reason": "TFJobSucceeded"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-0" not found {"tfjob": {"name":"test-status-0","namespace":"default"}, "unable to fetch TFJob": "default/test-status-0"}
2023-07-03T15:53:48Z INFO KubeAPIWarningLogger unknown field "spec.tfReplicaSpecs.Chief.template.metadata.creationTimestamp"
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-0 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"succeeded":1},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-0", "job description": "Chief worker is succeeded"}
2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is running"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-1-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
2023-07-03T15:53:48Z DEBUG events Created pod: test-status-1-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreatePod"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-1" not found {"tfjob": {"name":"test-status-1","namespace":"default"}, "unable to fetch TFJob": "default/test-status-1"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-1-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-1","uid":"cb23297b-1f26-4599-af2a-d371e64bf706"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-1 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-1", "job description": "Chief worker is running"}
2023-07-03T15:53:48Z INFO testing case {"description": "Chief worker is failed"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
2023-07-03T15:53:48Z DEBUG events Created pod: test-status-2-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreatePod"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-2-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-2" not found {"tfjob": {"name":"test-status-2","namespace":"default"}, "unable to fetch TFJob": "default/test-status-2"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-2-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-2 has failed because 1 Chief replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-2","uid":"6fb54294-309d-4fe8-a7c2-57eed025e2f1"}, "reason": "TFJobFailed"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-2 has failed because 1 Chief replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{"failed":1},"PS":{},"Worker":{}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-2", "job description": "Chief worker is failed"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is failed"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-3" not found {"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-3" not found {"tfjob": {"name":"test-status-3","namespace":"default"}, "unable to fetch TFJob": "default/test-status-3"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-3-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-3 has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-3","uid":"629f34d4-36bd-461b-ab76-b667b82f26de"}, "reason": "TFJobFailed"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-3 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"failed":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-3", "job description": "(No chief worker) Worker is failed"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is succeeded"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-4" not found {"tfjob": {"name":"test-status-4","namespace":"default"}, "unable to fetch TFJob": "default/test-status-4"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-4-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-4 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-4","uid":"4d770ef0-c20a-4743-a5c0-d0311b742723"}, "reason": "TFJobSucceeded"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-4 successfully completed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":1}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-4", "job description": "(No chief worker) Worker is succeeded"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) Worker is running"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-5" not found {"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-5" not found {"tfjob": {"name":"test-status-5","namespace":"default"}, "unable to fetch TFJob": "default/test-status-5"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-5-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-5","uid":"9d893137-d9f5-4e03-9112-ffd9e07d3dce"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-5 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":1}},"startTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-5", "job description": "(No chief worker) Worker is running"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5[365](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:366)"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-6-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-6","uid":"d211c54a-a565-4aba-848b-7b97dbbe5365"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-6" not found {"tfjob": {"name":"test-status-6","namespace":"default"}, "unable to fetch TFJob": "default/test-status-6"}
2023-07-03T15:53:48Z INFO KubeAPIWarningLogger unknown field "spec.tfReplicaSpecs.PS.template.metadata.creationTimestamp"
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-6 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"succeeded":2}},"startTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-6", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are active"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are running, 2 workers are failed"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-7" not found {"tfjob": {"name":"test-status-7","namespace":"default"}, "unable to fetch TFJob": "default/test-status-7"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-7-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z DEBUG events TFJob default/test-status-7 has failed because 2 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-7","uid":"4d06e1e8-854d-4287-a17e-0a802dc57584"}, "reason": "TFJobFailed"}
2023-07-03T15:53:48Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-7 is running.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-7 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:48Z","lastTransitionTime":"2023-07-03T15:53:48Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":2,"failed":2}},"startTime":"2023-07-03T15:53:48Z","completionTime":"2023-07-03T15:53:48Z"}}
2023-07-03T15:53:48Z INFO passed! {"job name": "test-status-7", "job description": "(No chief worker) 2 workers are running, 2 workers are failed"}
2023-07-03T15:53:48Z INFO testing case {"description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-8-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:48Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:48Z DEBUG events Created service: test-status-8-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-8" not found {"tfjob": {"name":"test-status-8","namespace":"default"}, "unable to fetch TFJob": "default/test-status-8"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-8-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-8 has failed because 2 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-8","uid":"46597163-a9c8-45b6-9441-9a38e6bfbd89"}, "reason": "TFJobFailed"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-8 has failed because 2 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":2,"failed":2}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-8", "job description": "(No chief worker) 2 workers are succeeded, 2 workers are failed"}
2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-9-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-[404](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:405)0-9e35-08998ad227fc"}, "reason": "ExitedWithCode"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-9" not found {"tfjob": {"name":"test-status-9","namespace":"default"}, "unable to fetch TFJob": "default/test-status-9"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-9-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-9 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-9","uid":"c1f8996b-0aa9-4040-9e35-08998ad227fc"}, "reason": "TFJobSucceeded"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-9 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-9", "job description": "(No chief worker) worker-0 are succeeded, 3 workers are active"}
2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-10-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "ExitedWithCode"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-10" not found {"tfjob": {"name":"test-status-10","namespace":"default"}, "unable to fetch TFJob": "default/test-status-10"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-10-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-10","uid":"825d4e69-90c4-4a68-ba83-a6712fcea574"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"True","reason":"TFJobRunning","message":"TFJob default/test-status-10 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":3,"succeeded":1}},"startTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-10", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 are succeeded, 3 workers are active"}
2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-11-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "ExitedWithCode"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-11" not found {"tfjob": {"name":"test-status-11","namespace":"default"}, "unable to fetch TFJob": "default/test-status-11"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-11-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-11 successfully completed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-11","uid":"e75310c0-c2e9-4d31-b3eb-5c75a05dc921"}, "reason": "TFJobSucceeded"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Succeeded","status":"True","reason":"TFJobSucceeded","message":"TFJob default/test-status-11 successfully completed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"succeeded":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-11", "job description": "(No chief worker, successPolicy: AllWorkers) 4 workers are succeeded"}
2023-07-03T15:53:49Z INFO testing case {"description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z DEBUG events Pod: default.test-status-12-worker-0 exited with code 0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "ExitedWithCode"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-12" not found {"tfjob": {"name":"test-status-12","namespace":"default"}, "unable to fetch TFJob": "default/test-status-12"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-12-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-12 has failed because 1 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-12","uid":"61888304-a461-4d2d-bcf1-aea09e17d811"}, "reason": "TFJobFailed"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-12 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-12 has failed because 1 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{},"PS":{},"Worker":{"active":2,"succeeded":1,"failed":1}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-12", "job description": "(No chief worker, successPolicy: AllWorkers) worker-0 is succeeded, 2 workers are running, 1 worker is failed"}
2023-07-03T15:53:49Z INFO testing case {"description": "Chief is running, workers are failed"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-[431](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:432)0-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-2 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-worker-3 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-ps-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-ps-1 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-13" not found {"tfjob": {"name":"test-status-13","namespace":"default"}, "unable to fetch TFJob": "default/test-status-13"}
2023-07-03T15:53:49Z DEBUG events Created service: test-status-13-chief-0 {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "SuccessfulCreateService"}
2023-07-03T15:53:49Z DEBUG events TFJob default/test-status-13 has failed because 4 Worker replica(s) failed. {"type": "Normal", "object": {"kind":"TFJob","namespace":"default","name":"test-status-13","uid":"97be66d4-1f99-4310-8478-ba19cd36a25f"}, "reason": "TFJobFailed"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":[{"type":"Running","status":"False","reason":"TFJobRunning","message":"TFJob default/test-status-13 is running.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"},{"type":"Failed","status":"True","reason":"TFJobFailed","message":"TFJob default/test-status-13 has failed because 4 Worker replica(s) failed.","lastUpdateTime":"2023-07-03T15:53:49Z","lastTransitionTime":"2023-07-03T15:53:49Z"}],"replicaStatuses":{"Chief":{"active":1},"PS":{"active":2},"Worker":{"failed":4}},"startTime":"2023-07-03T15:53:49Z","completionTime":"2023-07-03T15:53:49Z"}}
2023-07-03T15:53:49Z INFO passed! {"job name": "test-status-13", "job description": "Chief is running, workers are failed"}
2023-07-03T15:53:49Z INFO testing case {"description": "Chief is running, workers are succeeded"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO TFJob.kubeflow.org "test-status-14" not found {"tfjob": {"name":"test-status-14","namespace":"default"}, "unable to fetch TFJob": "default/test-status-14"}
2023-07-03T15:53:49Z INFO checking status {"tfJob.Status": {"conditions":null,"replicaStatuses":{"Chief":{},"PS":{"active":2},"Worker":{"succeeded":4}}}}
[FAILED] in [It] - /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:[458](https://github.com/kubeflow/training-operator/actions/runs/5446399997/jobs/9907080100?pr=1843#step:4:459) @ 07/03/23 15:53:49.286
<< Timeline
[FAILED] Expected
<bool>: false
to be true
In [It] at: /home/runner/work/training-operator/training-operator/go/src/github.com/kubeflow/training-operator/pkg/controller.v1/tensorflow/status_test.go:458 @ 07/03/23 15:53:49.286
------------------------------ |
/assign |
I couldn't reproduce these failures on my local with the following command: KUBEBUILDER_ASSETS="$(shell setup-envtest use $(ENVTEST_K8S_VERSION) -p path)" \
./bin/ginkgo --until-it-fails -v ./pkg/controller.v1/tensorflow/... I guess these tests take a bit of time. So we should increase the time until timeout for |
The default time is 1s until timeout for the |
That is a good point |
I'm working on this improvement. |
Only this is re-occurred:
/reopen
|
@tenzen-y: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
It seems that otherwise are resolved. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/lifecycle frozen |
/close |
@tenzen-y: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Chief worker is succeeded
https://github.com/kubeflow/training-operator/actions/runs/5133950363/jobs/9237255811#step:4:126
The text was updated successfully, but these errors were encountered: