Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test] TestComponentBuildHashInDiagnostics improve agent state check #5420

Conversation

AndersonQ
Copy link
Member

@AndersonQ AndersonQ commented Sep 4, 2024

What does this PR do?

Improves the agent status check on TestComponentBuildHashInDiagnostics to ensure the agent is healthy and has components.

Why is it important?

The test is flaky because the test did not account for the absence of components

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

  • N/A

How to test this PR locally

  • run the test with TEST_RUN_UNTIL_FAILURE until you're satisfied or it fails again

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@AndersonQ AndersonQ added the flaky-test Unstable or unreliable test cases. label Sep 4, 2024
@AndersonQ AndersonQ self-assigned this Sep 4, 2024
@AndersonQ AndersonQ requested a review from a team as a code owner September 4, 2024 08:55
Copy link
Contributor

mergify bot commented Sep 4, 2024

This pull request does not have a backport label. Could you fix it @AndersonQ? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip label Sep 4, 2024
@@ -130,13 +148,13 @@ func TestComponentBuildHashInDiagnostics(t *testing.T) {
allHealthy,
5*time.Minute, 10*time.Second,
"agent never became healthy. Last status: %v", &stateBuff)
t.Cleanup(func() {
defer func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we switching from Cleanup to defer here? As far as I'm aware these should be interchangeable if we aren't using subtests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of them attempts to move a file from a tempDir to a persistent one and it failed. So I'm guessing it might be because the temp directory was cleaned up before my function run. So I'm attempting to avoid it by using just defer

@AndersonQ AndersonQ added skip-changelog backport-8.15 Automated backport to the 8.15 branch with mergify backport-v8.x and removed backport-skip labels Sep 4, 2024

if len(status.Components) == 0 {
stateBuff.WriteString(fmt.Sprintf(
"healty but without components: agent status: %s-%s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"healty but without components: agent status: %s-%s",
"healthy but without components: agent status: %s-%s",

for _, c := range status.Components {
bs, err := json.MarshalIndent(status, "", " ")
if err != nil {
stateBuff.WriteString(fmt.Sprintf("%s not health, could not marshal status outptu: %v",
stateBuff.WriteString(fmt.Sprintf(
"%s not health, could not marshal status outptu: %v",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"%s not health, could not marshal status outptu: %v",
"%s not healthy, could not marshal status output: %v",

return false
}

if len(status.Components) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment above this to explain why this check is required being it really fixed the flaky issue in the test.

@AndersonQ AndersonQ enabled auto-merge (squash) September 5, 2024 04:57
@AndersonQ AndersonQ merged commit 116e73f into elastic:main Sep 5, 2024
12 checks passed
Copy link

mergify bot pushed a commit that referenced this pull request Sep 5, 2024
…check (#5420)

ensure the agent status has components, all components are healthy and the version info is up-to-date

(cherry picked from commit 116e73f)

# Conflicts:
#	testing/integration/package_version_test.go
mergify bot pushed a commit that referenced this pull request Sep 5, 2024
…check (#5420)

ensure the agent status has components, all components are healthy and the version info is up-to-date

(cherry picked from commit 116e73f)

# Conflicts:
#	testing/integration/package_version_test.go
@AndersonQ AndersonQ deleted the 5333-flaky-TestComponentBuildHashInDiagnostics branch September 5, 2024 12:44
AndersonQ added a commit that referenced this pull request Sep 9, 2024
…cs improve agent state check (#5435)

* [Flaky Test] TestComponentBuildHashInDiagnostics improve agent state check (#5420)

ensure the agent status has components, all components are healthy and the version info is up-to-date

(cherry picked from commit 116e73f)

* manually backport de3dec4

---------

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport to the 8.15 branch with mergify flaky-test Unstable or unreliable test cases. skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flaky Test]: TestComponentBuildHashInDiagnostics – build hash is empty
4 participants