Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestCertManagerAddon flakes occasionally #159

Closed
1 task done
rainest opened this issue Nov 18, 2021 · 1 comment · Fixed by #165
Closed
1 task done

TestCertManagerAddon flakes occasionally #159

rainest opened this issue Nov 18, 2021 · 1 comment · Fixed by #165
Assignees
Labels
area/ci bug Something isn't working priority/medium

Comments

@rainest
Copy link
Contributor

rainest commented Nov 18, 2021

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Some test runs for tests-and-coverage fail with:

 === CONT  TestCertManagerAddon
    certmanager_test.go:25: 
        	Error Trace:	certmanager_test.go:25
        	Error:      	Received unexpected error:
        	            	failed to deploy YAML STDOUT=() STDERR=(Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.96.126.96:443: connect: connection refused
        	            	): exit status 1
        	Test:       	TestCertManagerAddon
--- FAIL: TestCertManagerAddon (86.36s)

Re-running usually fixes this.

Expected Behavior

Test should succeed consistently with the same code.

Steps To Reproduce

Unknown. Probably a mild race condition somewhere where we need to wait a bit or confirm readiness of some service before trying some action.

Kong Kubernetes Testing Framework Version

No response

Kubernetes version

No response

Anything else?

No response

@rainest rainest added the bug Something isn't working label Nov 18, 2021
@shaneutt shaneutt self-assigned this Nov 29, 2021
@shaneutt
Copy link
Contributor

The current logic intends to wait for the webhook deployment pod availability, so this report seems to suggest an issue in upstream in that we can't depend on that availability as there is some very brief window where it can be available but also not actually ready? Testing this locally I couldn't trigger it myself, so I assume the window must be quite limited. I've created #165 to resolve this, which adds a job to wait for HTTP requests to the webhook to actually succeed over the cluster network before considering the addon "ready".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci bug Something isn't working priority/medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants