Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore][extension/observer/k8sobserver] try to fix flaky test (DATA RACE) of TestExtensionObservePods #29463

Merged
merged 2 commits into from
Dec 1, 2023

Conversation

sakulali
Copy link
Contributor

@sakulali sakulali commented Nov 23, 2023

Description:
Try to fix flaky test (DATA RACE) of TestExtensionObservePods. Since config.ObservePods is set true by default, for TestExtensionObserveServices and TestExtensionObserveNodes test case, we just focus on services observer and nodes observer, so we don't need to use k8sObserver.podListerWatcher which can avoid data race in the same process.

Link to tracking Issue:
#29448

Testing:
go test for k8sobserver

Preparation:

DIR = opentelemetry-collector-contrib/extension/observer/k8sobserver
DATA_DIR = /path/to/otel/coverage/unit
CMD = go test -run TestExtensionObserve -v -count=1 -race -timeout 300s -parallel 4 --tags="" -cover ./... -covermode=atomic -args -test.gocoverdir=${DATA_DIR}
MULTI_CMD = for i in {1..50}; do echo "Run $i"; ./{CMD} &; done

Tests:

  1. I found config.ObservePods is set true by default. Then TestExtensionObserveServices and TestExtensionObserveNodes will enable k8sObserver.podListerWatcher and try to access the same http proxy since they are in the same process.

    // CreateDefaultConfig creates the default configuration for the extension.
    func createDefaultConfig() component.Config {
    return &Config{
    APIConfig: k8sconfig.APIConfig{AuthType: k8sconfig.AuthTypeServiceAccount},
    ObservePods: true,
    ObserveNodes: false,
    }
    }

  2. I add log(as shown below) for TestExtensionObserve* functions and found test case in the same process. Refer to stack overflow, go test are executed as goroutines and executed concurrently.

image

--------- TestExtensionObserveServices process ID: 38910
--------- TestExtensionObservePods process ID: 38910
--------- TestExtensionObserveNodes process ID: 38910

  1. I add log (as shown below) for http proxy setting addr, and found use the same address:
image

-------------- use proxy connectMethodForRequest t.Proxy addr: 0xc0007142f0 -------------------
-------------- transport t.Proxy (is) nil
-------------- send request: {Method:GET URL:https://mock:12345/api/v1/pods?limit=500&resourceVersion=0 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Accept:[application/json, /] User-Agent:[k8sobserver.test/v0.0.0 (darwin/amd64) kubernetes/$Format]] Body: GetBody: ContentLength:0 TransferEncoding:[] Close:false Host:mock:12345 Form:map[] PostForm:map[] MultipartForm: Trailer:map[] RemoteAddr: RequestURI: TLS: Cancel: Response: ctx:0xc000410120}

  1. So we may find one goroutine is writing http proxy and the other goroutine read with http proxy(HTTP request for Pods Lister). As a result, this has led to data race.

Inference:
I found that config.ObservePods is set true by default. For TestExtensionObserveServices and TestExtensionObserveNodes test case, we just focus on services observer and nodes observer, so we don't need to use k8sObserver.podListerWatcher which can avoid data race in the same process.

Documentation:

@github-actions github-actions bot requested a review from dmitryax November 23, 2023 16:15
@sakulali sakulali marked this pull request as ready for review November 23, 2023 16:16
@sakulali sakulali requested a review from a team November 23, 2023 16:16
@crobert-1
Copy link
Member

I'm a bit worried about removing the proxy setting when it's set to nil on purpose. Is it somehow unnecessary now? Otherwise, is there some way to make this operation work with concurrent access?

@sakulali
Copy link
Contributor Author

I'm a bit worried about removing the proxy setting when it's set to nil on purpose. Is it somehow unnecessary now? Otherwise, is there some way to make this operation work with concurrent access?

Thanks for your reviews @crobert-1! I shouldn't remove this proxy setting, i need to further investigate the specific reason before taking any action.

@sakulali sakulali marked this pull request as draft November 29, 2023 03:38
@sakulali sakulali changed the title [chore][extension/observer/k8sobserver] try to fix flaky test (DATA RACE) of TestExtensionObservePods WIP: [chore][extension/observer/k8sobserver] try to fix flaky test (DATA RACE) of TestExtensionObservePods Nov 29, 2023
…ACE) of TestExtensionObservePods

Signed-off-by: sakulali <sakulali@126.com>
…same process using podListerWatcher

Signed-off-by: sakulali <sakulali@126.com>
@sakulali sakulali force-pushed the fix-k8sobserver-flaky-test branch from 48f1e37 to 9768453 Compare November 30, 2023 15:58
@sakulali sakulali changed the title WIP: [chore][extension/observer/k8sobserver] try to fix flaky test (DATA RACE) of TestExtensionObservePods [chore][extension/observer/k8sobserver] try to fix flaky test (DATA RACE) of TestExtensionObservePods Nov 30, 2023
@sakulali sakulali marked this pull request as ready for review November 30, 2023 16:39
@sakulali
Copy link
Contributor Author

sakulali commented Nov 30, 2023

Hi @crobert-1, could you mind help to reviews again? After I added debug logs, i discovered some interesting things and maybe we can resolve this issue, please refer to the above Testing section.

Copy link
Member

@crobert-1 crobert-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for including all of the references as well!

@crobert-1 crobert-1 added the flaky test a test is flaky label Nov 30, 2023
@mx-psi mx-psi merged commit 9db03ec into open-telemetry:main Dec 1, 2023
89 checks passed
@github-actions github-actions bot added this to the next release milestone Dec 1, 2023
@sakulali sakulali deleted the fix-k8sobserver-flaky-test branch December 1, 2023 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants