Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow exporter memory bloat when unable to connect with downstream collector #3972

Closed
wsquan171 opened this issue Jul 7, 2022 · 1 comment
Assignees
Labels
area/flow-visibility/exporter Issues or PRs related to the Flow Exporter functions in the Agent kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@wsquan171
Copy link
Contributor

wsquan171 commented Jul 7, 2022

Describe the bug
When flow exporter is enabled, but failed to connect to downstream IPFIX collector, antrea agent will run into various memory issues. In a scale setup with limited worker node memory, iptables could be crashed due to memory exhaustion, and host reboot is needed for node recovery.

The issue is introduced as part of priority queue refactor #2360, mainly in the following aspects:

Stale connection objects cannot be evicted from buffer (ExpirePriorityQueue.items and ExpirePriorityQueue.KeyToItem) when export process is dead.

Prior to PQ refactor, flowRecords maps acts as the buffer between connection store map and flow export process. Even if export process is dead, the store polling goroutine will periodical expire stale flow records.

		record, exists := cs.flowRecords.GetFlowRecordFromMap(&key)
		if exists {
			// Delete the connection if it was not exported for the time
			// period as specified by the stale connection timeout.
			if time.Since(record.LastExportTime) >= cs.staleConnectionTimeout {
				// Ignore error if flow record not found.
				cs.flowRecords.DeleteFlowRecordFromMap(&key)
				delete(cs.connections, key)
			}
		}

with the current approach, clean up is only run part of sendFlowRecords (-> GetExpiredConns -> UpdateConnAndQueue -> RemoveItemFromMap), which can not happen if export process is dead.

Leaked pqItem in ExpirePriorityQueue due to uninitialized LastExportTime

During connection store periodical polling, connections determined as stale will be removed from connection map

			if conn.ReadyToDelete || time.Since(conn.LastExportTime) >= cs.staleConnectionTimeout {
				if err := cs.deleteConnWithoutLock(key); err != nil {
					return err
				}
			}

However, conn.LastExportTime will only be set during sendFlowRecords workflow (in UpdateConnAndQueue). When exporting process is dead, time.Since(conn.LastExportTime) will always return a very large difference due to compared to zero value, causing such connection to be removed from conn store map as soon as conntrack removes such flow.

I0707 01:21:44.779478 1 conntrack_connections.go:115] conn [10.10.1.5 51590 10.10.1.2 80 6] deleted due to timeout 2562047h47m16.854775807s

In such case, duplicate entries with same flow key will be added to PQ, rather than properly updating the existing item (note the difference between pq map size and pq queue size).

I0707 01:12:34.779323 1 conntrack_connections.go:99] conn map size 6, pq map size 12, pq queue size 12
I0707 01:12:34.779333 1 conntrack_connections.go:115] conn [10.10.1.5 51602 10.10.1.2 80 6] deleted due to timeout 2562047h47m16.854775807s
I0707 01:12:39.779637 1 conntrack_connections.go:99] conn map size 0, pq map size 12, pq queue size 12
I0707 01:14:14.782239 1 conntrack_connections.go:269] "New Antrea flow added" connection=&{ID:1745388421 Timeout:118 StartTime:2022-07-07 01:14:13.506199834 +0000 UTC StopTime:2022-07-07 01:14:14.781683955 +0000 UTC m=+243.226745667 LastExportTime:0001-01-01 00:00:00 +0000 UTC IsActive:true IsPresent:true ReadyToDelete:false Zone:65520 Mark:3 StatusFlag:14 Labels:[] LabelsMask:[] FlowKey:{SourceAddress:10.10.1.5 DestinationAddress:10.10.1.2 Protocol:6 SourcePort:51602 DestinationPort:80} OriginalPackets:6 OriginalBytes:397 SourcePodNamespace:default SourcePodName:busybox DestinationPodNamespace:echoserver DestinationPodName:echoserver-869f89668b-w4jql DestinationServicePortName: DestinationServiceAddress:10.10.1.2 DestinationServicePort:80 IngressNetworkPolicyName: IngressNetworkPolicyNamespace: IngressNetworkPolicyType:0 IngressNetworkPolicyRuleName: IngressNetworkPolicyRuleAction:0 EgressNetworkPolicyName: EgressNetworkPolicyNamespace: EgressNetworkPolicyType:0 EgressNetworkPolicyRuleName: EgressNetworkPolicyRuleAction:0 PrevPackets:0 PrevBytes:0 ReversePackets:4 ReverseBytes:1518 PrevReversePackets:0 PrevReverseBytes:0 TCPState:TIME_WAIT PrevTCPState:}
I0707 01:14:19.779294 1 conntrack_connections.go:99] conn map size 1, pq map size 12, pq queue size 13
I0707 01:16:19.857491 1 conntrack_connections.go:115] conn [10.10.1.5 51602 10.10.1.2 80 6] deleted due to timeout 2562047h47m16.854775807s
I0707 01:16:24.779999 1 conntrack_connections.go:99] conn map size 5, pq map size 12, pq queue size 18

Before PQ change, LastExportTime is set to conn.StartTime soon after being dumped:

		record = flowexporter.FlowRecord{
			Conn:               *conn,
			PrevPackets:        0,
			PrevBytes:          0,
			PrevReversePackets: 0,
			PrevReverseBytes:   0,
			IsIPv6:             isIPv6,
			LastExportTime:     conn.StartTime,
			IsActive:           true,
			DyingAndDoneExport: false,
		}

FlowExporter expiredConns slice could potentially exceed expected size limit of 128

expiredConns is initialized with expected size limit of 128 to achieve a bounded time of holding export lock:

		expiredConns:           make([]flowexporter.Connection, 0, maxConnsToExport*2),

However, during sendFlowRecords, if exp.exportConn encounters error, the expiredConns slice won't have its offset reset. This could lead to 2 potential issues during next iteration of sendFlowRecords:

  1. due to leftovers in the slice, and maxConnsToExport not being adjusted, the size of the underlying array will be doubled to fit more than 128 items in the array
  2. same conn could be exported twice
func (exp *FlowExporter) sendFlowRecords() (time.Duration, error) {
	currTime := time.Now()
	var expireTime1, expireTime2 time.Duration
	exp.expiredConns, expireTime1 = exp.conntrackConnStore.GetExpiredConns(exp.expiredConns, currTime, maxConnsToExport)
	exp.expiredConns, expireTime2 = exp.denyConnStore.GetExpiredConns(exp.expiredConns, currTime, maxConnsToExport)
	// Select the shorter time out among two connection stores to do the next round of export.
	nextExpireTime := getMinTime(expireTime1, expireTime2)
	for i := range exp.expiredConns {
		if err := exp.exportConn(&exp.expiredConns[i]); err != nil {
			klog.ErrorS(err, "Error when sending expired flow record")
			return nextExpireTime, err
		}
	}
	// Clear expiredConns slice after exporting. Allocated memory is kept.
	exp.expiredConns = exp.expiredConns[:0]
	return nextExpireTime, nil
}

To Reproduce
Deploy antrea v1.6.0 with FlowExporter enabled without flow aggregator.

Expected
Flow Exporter memory should not bloat, and stale connection objects should be cleaned from memory after a certain period.

Actual behavior
See above

Versions:

  • Antrea version: v1.6.0 (additional logging added).
  • Kubernetes version: v1.24.2
  • Container runtime: cri-o
  • Linux kernel version on the Kubernetes Nodes: 5.4.0-105-generic.
@wsquan171 wsquan171 added the kind/bug Categorizes issue or PR as related to a bug. label Jul 7, 2022
@wsquan171 wsquan171 self-assigned this Jul 7, 2022
@wsquan171 wsquan171 added the area/flow-visibility/exporter Issues or PRs related to the Flow Exporter functions in the Agent label Jul 7, 2022
antoninbas pushed a commit that referenced this issue Jul 13, 2022
When flow exporter is enabled, but failed to connect to downstream IPFIX
collector, connections added to the priority queue inside flow exporter
won't be expired and removed from queue, causing memory to
bloat. Furthermore, connection store polling will remove conn from its
connections map when the flow is no longer in conntrack, but not the
related items in priority queue. When a new flow with same flow key is
reestablished, a duplicated item with same key will be added to the
queue, while the reference to the old one is lost, essentially causing
memory leak.

This change addresses above issue in the following aspects:

* Connection store polling removes stale conn from both connections map
  and the priority queue. Since CS polling is independent of exporting
  process liveness, this allows clean up to be done without connection
  to collector.
* Fixes init of Connection.LastExportTime to be connection start time to
  make sure CS polling logic works properly when the exporting process
  is dead. Previously LastExportTime will only be filled by exporting
  process at the time of export, causing zero value to be compare in
  certain cases.
* Adds guards in priority queue to prevent having two Connections with
  same connection key in the heap.

Benchmark test BenchmarkExportConntrackConns did not show observable
difference before and after change.

Fixes item 1 and 2 in #3972. Severity of item 3 is lower, which will be
addressed in a later change.

Signed-off-by: Shawn Wang <wshaoquan@vmware.com>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2022

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 6, 2022
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/flow-visibility/exporter Issues or PRs related to the Flow Exporter functions in the Agent kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

1 participant