Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky test: [It] should delete job when expired time is up #1808

Merged
merged 1 commit into from
May 22, 2023

Conversation

tenzen-y
Copy link
Member

What this PR does / why we need it:
I fixed the flaky test case, [It] should delete job when expired time is up for TFJob.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #1802

Checklist:

  • Docs included if any changes are user facing

@coveralls
Copy link

coveralls commented May 20, 2023

Pull Request Test Coverage Report for Build 5033675281

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.1%) to 39.584%

Totals Coverage Status
Change from base Build 4988117671: -0.1%
Covered Lines: 2742
Relevant Lines: 6927

💛 - Coveralls

@tenzen-y tenzen-y changed the title Fix flaky test: [It] should delete job when expired time is up WIP: Fix flaky test: [It] should delete job when expired time is up May 20, 2023
@tenzen-y tenzen-y force-pushed the fix-flaky-ttl-test branch from 825cc27 to cd9c3d4 Compare May 20, 2023 18:52
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
@google-oss-prow google-oss-prow bot added size/M and removed size/S labels May 20, 2023
@tenzen-y tenzen-y force-pushed the fix-flaky-ttl-test branch from cd9c3d4 to e5f9e85 Compare May 20, 2023 18:53
@google-oss-prow google-oss-prow bot added size/S and removed size/M labels May 20, 2023
Comment on lines +557 to +562
// We need to wait for synchronizing cache.
By("getting a created TFJob")
var updatedTFJob kubeflowv1.TFJob
Eventually(func() error {
return reconciler.Get(ctx, client.ObjectKeyFromObject(tc.tfJob), &updatedTFJob)
}, timeout, interval).Should(BeNil())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get the latest TFJob after creating and updating TFJobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one. Is this guaranteed to have cache synchronisation and we get the latest object?

Copy link
Member Author

@tenzen-y tenzen-y May 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We need to wait for synchronizing cache.
By("getting a created TFJob")
var updatedTFJob kubeflowv1.TFJob
Eventually(func() error {
return reconciler.Get(ctx, client.ObjectKeyFromObject(tc.tfJob), &updatedTFJob)
}, timeout, interval).Should(BeNil())
By("getting a created TFJob")
var updatedTFJob kubeflowv1.TFJob
Expect(reconciler.Get(ctx, client.ObjectKeyFromObject(tc.tfJob), &updatedTFJob)).Should(Succeeded())

Actually, I tried the above code. But I couldn't obtain any TFJob. So we must wait for synchronizing cache.

ref: 825cc27

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we can avoid the following error since getting TFJob after creating one.

  Expected success, but got an error:
      <*errors.StatusError | 0xc0019a06e0>: {
          ErrStatus: {
              TypeMeta: {Kind: "", APIVersion: ""},
              ListMeta: {
                  SelfLink: "",
                  ResourceVersion: "",
                  Continue: "",
                  RemainingItemCount: nil,
              },
              Status: "Failure",
              Message: "Operation cannot be fulfilled on tfjobs.kubeflow.org \"test-bof-0\": the object has been modified; please apply your changes to the latest version and try again",
              Reason: "Conflict",
              Details: {
                  Name: "test-bof-0",
                  Group: "kubeflow.org",
                  Kind: "tfjobs",
                  UID: "",
                  Causes: nil,
                  RetryAfterSeconds: 0,
              },
              Code: 409,
          },
      }

@tenzen-y tenzen-y changed the title WIP: Fix flaky test: [It] should delete job when expired time is up Fix flaky test: [It] should delete job when expired time is up May 20, 2023
@johnugeorge
Copy link
Member

Thanks @tenzen-y
/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 717f1de into kubeflow:master May 22, 2023
@tenzen-y tenzen-y deleted the fix-flaky-ttl-test branch May 22, 2023 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: Test TTL Seconds After Finished
3 participants