Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle exceptions gracefully when delete non-existent resources during integ test resource clean up #1154

Merged
merged 5 commits into from
Feb 10, 2025

Conversation

weijia-aws
Copy link
Contributor

@weijia-aws weijia-aws commented Jan 30, 2025

Description

When integ test runs, the test will create model, index, pipeline, etc., then perform some assertions, finally clean up the resources.

public void processor_integrationTest() {
    try {
        // load model
        // create pipeline/index
        // do tests/asserts
    } finally {
        // cleanup pipeline, index, model
    }
}

If somehow resource creation fail and receive an exception, the test will enter the finally block and try to delete the resources. Since these resources don't exist, exception will be thrown when try to delete them, see issue 1091. I was able to reproduce the exception by following #1093 (comment). So in this case, we should gracefully handle such NOT FOUND exceptions

Related Issues

Resolves #1098

Related issue: #1091 and #1093

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@heemin32
Copy link
Collaborator

Can we expand the scope of this PR to shift the responsibility for resource cleanup from individual tests to the framework? This way, each test wouldn’t need to handle resource cleanup individually.

@vibrantvarun
Copy link
Member

vibrantvarun commented Jan 31, 2025

Can we expand the scope of this PR to shift the responsibility for resource cleanup from individual tests to the framework? This way, each test wouldn’t need to handle resource cleanup individually.

@heemin32 I would suggest not to do that. As Integ tests are meant to be run in isolation mode. With resource cleanup at framework level, it will create issues where test will start sharing resources with each other and can cause critical issues to be skipped at integ test level because the resource might be created by some other method and used by some other method.

@vibrantvarun
Copy link
Member

LGTM

@minalsha
Copy link
Collaborator

#1154 (comment)

+1 to Varun's comment here.

@heemin32
Copy link
Collaborator

heemin32 commented Jan 31, 2025

@heemin32 I would suggest not to do that. As Integ tests are meant to be run in isolation mode. With resource cleanup at framework level, it will create issues where test will start sharing resources with each other and can cause critical issues to be skipped at integ test level because the resource might be created by some other method and used by some other method.

Have you seen any such issue in k-nn repo where the resource clean up is handling in framework level? The resource sharing issue can happen when clean up is happening in individual test level and developer missed to clean up the resource in a test.

@vibrantvarun vibrantvarun self-requested a review January 31, 2025 22:59
…g integ test resource clean up

Signed-off-by: Weijia Zhao <zweijia@amazon.com>
Signed-off-by: Weijia Zhao <zweijia@amazon.com>
Copy link

codecov bot commented Feb 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.65%. Comparing base (e8ed3a4) to head (e296bc6).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1154      +/-   ##
============================================
- Coverage     81.72%   81.65%   -0.08%     
+ Complexity     2494     1245    -1249     
============================================
  Files           186       93      -93     
  Lines          8426     4213    -4213     
  Branches       1428      714     -714     
============================================
- Hits           6886     3440    -3446     
+ Misses         1000      501     -499     
+ Partials        540      272     -268     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Weijia Zhao <zweijia@amazon.com>
Copy link
Collaborator

@heemin32 heemin32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

heemin32
heemin32 previously approved these changes Feb 6, 2025
@heemin32 heemin32 added the backport 2.x Label will add auto workflow to backport PR to 2.x branch label Feb 6, 2025
@heemin32
Copy link
Collaborator

heemin32 commented Feb 6, 2025

Test is failing. Please take a look.

@weijia-aws
Copy link
Contributor Author

Test is failing. Please take a look.

qa tests are failing because: testAgainstNewCluster depends on resources that created in testAgainstOldCluster, however when testAgainstOldCluster finishes, all resources are cleaned up, resulting in testAgainstNewCluster tests fail with java.lang.NullPointerException or Resource not found exception. In order to fix this, we will need to re-create resources in testAgainstNewCluster tasks

@weijia-aws
Copy link
Contributor Author

In order to fix this, we will need to re-create resources in testAgainstNewCluster tasks

We can also modify the cleanUp method to not delete resources if running against old cluster?

@weijia-aws
Copy link
Contributor Author

We can also modify the cleanUp method to not delete resources if running against old cluster?

I think we should do this, as this is how the current behavior is, and requires less code changes comparing to re-create resources

@heemin32
Copy link
Collaborator

heemin32 commented Feb 6, 2025

We can also modify the cleanUp method to not delete resources if running against old cluster?

I think we should do this, as this is how the current behavior is, and requires less code changes comparing to re-create resources

Recreation of resource break the purpose of bwc test where we are validating resource created in old cluster still works in new cluster.

How about exposing a method which tell if sub class want to clean up resource or not? Then, for bwc test, we can skip deletion of resources and delete the resources manually as we do today.

Signed-off-by: Weijia Zhao <zweijia@amazon.com>
@heemin32 heemin32 dismissed their stale review February 10, 2025 17:22

Unresolved comments

Signed-off-by: Weijia Zhao <zweijia@amazon.com>
Copy link
Collaborator

@heemin32 heemin32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@heemin32
Copy link
Collaborator

The failure in bwc is a known issue.
#1142

@heemin32 heemin32 merged commit 7dc84b5 into opensearch-project:main Feb 10, 2025
71 of 75 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1154-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7dc84b50d3b1727cafb4b07342df9ddff2561ffd
# Push it to GitHub
git push --set-upstream origin backport/backport-1154-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1154-to-2.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch good first issue Good for newcomers Infrastructure skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Infrastructure] Wipe integration test resources and surface exceptions gracefully
6 participants