Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky test] TestFlowAggregator/IPv4/InterNodeFlows #3650

Closed
antoninbas opened this issue Apr 15, 2022 · 2 comments · Fixed by #3655
Closed

[Flaky test] TestFlowAggregator/IPv4/InterNodeFlows #3650

antoninbas opened this issue Apr 15, 2022 · 2 comments · Fixed by #3655
Assignees
Labels
area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
I observed a failure of TestFlowAggregator/IPv4/InterNodeFlows in Kind CI, for the "E2e tests on a Kind cluster on Linux with AntreaProxy all Service support" job (don't know if the exact job is relevant):

2022-04-14T06:07:08.0155983Z === RUN   TestFlowAggregator/IPv4/InterNodeFlows
2022-04-14T06:07:08.0159474Z I0414 06:07:08.015721   17954 k8s_util.go:731] Creating/updating Antrea NetworkPolicy antrea-test/test-flow-aggregator-antrea-networkpolicy-ingress
2022-04-14T06:07:08.0305576Z I0414 06:07:08.030293   17954 k8s_util.go:731] Creating/updating Antrea NetworkPolicy antrea-test/test-flow-aggregator-antrea-networkpolicy-egress
2022-04-14T06:07:08.1416689Z     flowaggregator_test.go:958: Antrea Network Policies are realized.
2022-04-14T06:07:20.5808986Z     flowaggregator_test.go:690: Throughput check on record with flowEndSeconds-flowStartSeconds: 4, Iperf throughput: 10.00 Mbits/s, IPFIX record throughput: 3.94 Mbits/s
2022-04-14T06:07:20.5810485Z     flowaggregator_test.go:691: 
2022-04-14T06:07:20.5811149Z         	Error Trace:	flowaggregator_test.go:691
2022-04-14T06:07:20.5812029Z         	            				flowaggregator_test.go:344
2022-04-14T06:07:20.5813299Z         	Error:      	Max difference between 3.937286 and 10 allowed is 1.5, but difference was -6.062714
2022-04-14T06:07:20.5814009Z         	Test:       	TestFlowAggregator/IPv4/InterNodeFlows

Full test logs: https://gist.github.com/antoninbas/f543aa2e9eb2c34cba41edf9825bf39e

@antoninbas antoninbas added kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator labels Apr 15, 2022
@antoninbas
Copy link
Contributor Author

@heanlan @dreamtalen could someone take a look at this?

@heanlan
Copy link
Contributor

heanlan commented Apr 15, 2022

@antoninbas ack

@heanlan heanlan self-assigned this Apr 15, 2022
heanlan added a commit to heanlan/antrea that referenced this issue Apr 19, 2022
Conntrack connection store's polling go routine and flow exporter both access
to conntrack connection store, and there's a race condition error.

In the polling go routine, `deleteIfStaleOrResetConn` and `AddOrUpdateConn`
both grab the lock, modify `conn.IsPresent` field, and release the lock. Between
the execution of these two functions, it is likely that FlowExporter's timer is
triggered and it reads the wrong `conn.IsPresent` value in an intermidiate state.
We fix it by holding the lock until we finish the execution of both two functions.

Fixes: antrea-io#3650

Signed-off-by: heanlan <hanlan@vmware.com>
heanlan added a commit to heanlan/antrea that referenced this issue Apr 19, 2022
Conntrack connection store's polling go routine and flow exporter both access
to conntrack connection store, and there's a race condition error.

In the polling go routine, `deleteIfStaleOrResetConn` and `AddOrUpdateConn`
both grab the lock, modify `conn.IsPresent` field, and release the lock. Between
the execution of these two functions, it is likely that FlowExporter's timer is
triggered and it reads the wrong `conn.IsPresent` value in an intermidiate state.
We fix it by holding the lock until we finish the execution of both two functions.

Fixes: antrea-io#3650

Signed-off-by: heanlan <hanlan@vmware.com>
heanlan added a commit to heanlan/antrea that referenced this issue Apr 21, 2022
Conntrack connection store's polling go routine and flow exporter both access
to conntrack connection store, and there's a race condition error.

In the polling go routine, `deleteIfStaleOrResetConn` and `AddOrUpdateConn`
both grab the lock, modify `conn.IsPresent` field, and release the lock. Between
the execution of these two functions, it is likely that FlowExporter's timer is
triggered and it reads the wrong `conn.IsPresent` value in an intermidiate state.
We fix it by holding the lock until we finish the execution of both two functions.

Fixes: antrea-io#3650

Signed-off-by: heanlan <hanlan@vmware.com>
antoninbas pushed a commit that referenced this issue Apr 25, 2022
Conntrack connection store's polling goroutine and flow exporter both access
to conntrack connection store, and there's a race condition error.

In the polling go routine, `deleteIfStaleOrResetConn` and `AddOrUpdateConn`
both grab the lock, modify `conn.IsPresent` field, and release the lock. Between
the execution of these two functions, it is likely that FlowExporter's timer is
triggered and it reads the wrong `conn.IsPresent` value in an intermediate state.
We fix it by holding the lock until we finish the execution of both functions.

Fixes: #3650

Signed-off-by: heanlan <hanlan@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants