-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create and mainatain connection store for Antrea flow exporter #773
Conversation
Thanks for your PR. The following commands are available:
These commands can only be run by members of the vmware-tanzu organization. |
b48a865
to
b2754d2
Compare
40915da
to
48a0e9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. Some quick comments.
@@ -0,0 +1,5 @@ | |||
// +build windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not check all code yet, but seems a little strange to add types_windows.go just for an empty struct. Could we use a shared struct between Windows and Linux, or move the struct to other files like conntrack_linux/windows.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using separate struct because there is dependency on unix netlink in the conntrack library we are using. I kept it outside to avoid cyclic dependency in testing mocks package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jianjuns , I removed the dependency on conntrack library for Connection struct. Thereby I could refactor lot of code--removed multiple windows and linux files. Was able to define a common interface and have separate implementations based on OS. Please take a look.
@@ -0,0 +1,17 @@ | |||
// +build windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need _windows.go for all these files? Could limit the funcs referenced outside in a single file, and so we just need a single _windows.go file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved everything into one file. Having interface in common file and defining functions in corresponding linux/windows got tricky because of dependency on linux specific conntrack library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a plan for Windows support? If a key part of the Antrea messaging is going to be unified dataplane and Windows as a first-class citizen, we don't want to have to much of a gap between the 2 platforms, and let that gap grow. @McCodeman
I brought up ovs-dpctl dump-conntrack
before and I think it is worth investigating if it can provide a unified collection mechanism for:
- Windows & Linux nodes
- all OVS datapath types, including netdev
- connections offloaded to hardware (which may not appear in conntrack?)
@srikartati do you know how ovs-dpctl dump-conntrack
compares to polling the conntrack module with netlink (e.g. in terms of quality of the information we can retrieve)?
build/yamls/antrea-eks.yml
Outdated
@@ -375,6 +375,8 @@ data: | |||
|
|||
# Enable metrics exposure via Prometheus. Initializes Prometheus metrics listener. | |||
#enablePrometheusMetrics: false | |||
# Enable flow exporter that exports IPFIX flow records for Antrea flows from conntrack module. | |||
#enableFlowExporter: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how confident are we in this configuration parameter? what are we exporting the records to (how to configure the destinations)?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the support for sending IPFIX records to collector is added, we will have flowCollector config parameter that takes in IP+port. This could be redundant then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should extend the comment to specify that this config parameter may go away in the future? @jianjuns
) | ||
|
||
const ( | ||
ctZone uint16 = 65520 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid hardcoding the ctZone in multiple different places (already defined in pkg/agent/openflow/pipeline.go
)?
a9d2c0f
to
d132a15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review.
@@ -0,0 +1,5 @@ | |||
// +build windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using separate struct because there is dependency on unix netlink in the conntrack library we are using. I kept it outside to avoid cyclic dependency in testing mocks package.
@@ -0,0 +1,17 @@ | |||
// +build windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved everything into one file. Having interface in common file and defining functions in corresponding linux/windows got tricky because of dependency on linux specific conntrack library.
That is definitely a good suggestion of using Can I see a couple of cons:
|
I agree we should reduce the feature gap between Linux and Windows, but on the other hand I believe Linux is still more important, so I am fine that we move faster on Linux (rather than blocking new features on Linux to wait for the Windows implementation ready). |
@jianjuns it seems we would need to invoke the executable and parse the output, but there may be an API I am not familiar with. @moshe010 can probable tell us if If it is a truly portable solution, it may be the best choice regardless of the limitation of having to call the executable. |
8b4c989
to
c04c6c5
Compare
@moshe010 has confirmed to me that |
Do you think we can use netlink to be the default implementation, and support dpctl or other ways when netlink does not work? |
I have the same question. When netlink is not supported for conntrack, can we resort to ovs-dpctl rather than using ovs-dpctl for everything? |
@jianjuns @srikartati sounds good to me, but I do recommend trying to keep Windows support on par with Linux as much as possible |
8420ae3
to
f7f602d
Compare
/test-all |
0208896
to
d0edf77
Compare
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remember to add integration tests for the connection dumper on Linux in a future PR.
cmd/antrea-agent/options.go
Outdated
@@ -103,6 +103,9 @@ func (o *Options) validate(args []string) error { | |||
if encapMode.SupportsNoEncap() && o.config.EnableIPSecTunnel { | |||
return fmt.Errorf("IPSec tunnel may only be enabled on %s mode", config.TrafficEncapModeEncap) | |||
} | |||
if o.config.OVSDatapathType == ovsconfig.OVSDatapathNetdev && features.DefaultFeatureGate.Enabled(features.FlowExporter) { | |||
return fmt.Errorf("Flow exporter is not supported for OVS datapath type %s", o.config.OVSDatapathType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we do not capitalize error messages in fmt.Errorf
} | ||
|
||
// Run polls the connTrackDumper module periodically to get connections. These connections are used | ||
// to build connection store. If there is an error in poll cycle, we break the loop and exit the routine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence no longer applies after update to the error handling below.
If there is an error in poll cycle, we break the loop and exit the routine.
func NewConnTrackInterfacer() *connTrackSystem { | ||
// Check value of setting net.netfilter.nf_conntrack_acct. Set to value 1, if it is not set. | ||
connTrackAcct, err := sysctl.GetSysctlNet("netfilter/nf_conntrack_acct") | ||
if err != nil { | ||
// Continue with creation of interfacer object as we can dump flows with no stats and that information can still be useful. | ||
// If permission error, please provide access to net.netfilter.nf_conntrack_acct. This will enable flow exporter to export stats and timestamps of connections. | ||
klog.Errorf("Error when getting net.netfilter.nf_conntrack_acct: %v", err) | ||
} else { | ||
if connTrackAcct == 0 { | ||
err = sysctl.SetSysctlNet("netfilter/nf_conntrack_acct", 1) | ||
if err != nil { | ||
// If permission error, please provide access to net.netfilter.nf_conntrack_acct. | ||
klog.Errorf("Error when setting net.netfilter.nf_conntrack_acct: %v", err) | ||
} | ||
// Set the conntrack timestamp setting to get timestamps of connections | ||
err = sysctl.SetSysctlNet("netfilter/nf_conntrack_timestamp", 1) | ||
if err != nil { | ||
klog.Errorf("Error when setting net.netfilter.nf_conntrack_timestamp: %v", err) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new code is organized a bit strangely I think.
Could we define an ensureNfConntrackFlag(flagName string, value int)
function, and then call it like this here:
ensureNfConntrackFlag("nf_conntrack_acct", 1)
ensureNfConntrackFlag("nf_conntrack_timestamp", 1)
You used to have some special handling for permission errors but now you just have these comments: "If permission error, please provide access to net.netfilter.nf_conntrack_acct". I find it a bit confusing. Shouldn't we have this in a log message so the user can see it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will change the organization accordingly.
@permisssion error: I realized that error already says permission access issue, so I removed from log. In addition, for other errors, I felt we do not need to return error as the rest of the flow data other than stats and timestamp is still useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered both conntrack settings to be independent in latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks better, but I stand by my earlier comment: since you do the same thing for both settings, why not define a helper function (e.g. ensureNfConntrackFlag
) which takes the settings name as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion here. Added helper function.
In prior way comments in the code for each setting are right before the error logs; felt that would be easier to read rather than having them in the caller of helper func.
Created connection store that stores the flows by polling conntrack module every 5s. cxnStore updates the flow if it is already there or add a new flow based on 5 tuple map. In addition, we add local pod information by querying interfaceStore. We added ipCache in interfaceStore for this purpose. Unit tests and testing on local setup is done with custom apps. Issue: antrea-io#712
Supporting Pod-to-Service flows. Eliminating duplicate flows with kube-proxy.
Added feature switch and addressed comments. Added logic to enable netfilter.conntrack_acct setting. Swapped names of connTrackPoller and connTrack to be more appropriate. Addressed comments. Issue antrea-io#712
Addressed comments. Optimized the check of sysctl settings check. Addressed merge conflicts.
9d96938
to
c4d80b0
Compare
/test-all |
/test-windows-conformance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for making the changes
Thanks for the review. |
/test-windows-conformance |
/test-windows-conformance |
/test-windows-networkpolicy |
1 similar comment
/test-windows-networkpolicy |
…a-io#773) * Create and maintain connection store for Antrea flows in conntrack Created connection store that stores the flows by polling conntrack module every 5s. cxnStore updates the flow if it is already there or add a new flow based on 5 tuple map. In addition, we add local pod information by querying interfaceStore. We added ipCache in interfaceStore for this purpose. Unit tests and testing on local setup is done with custom apps. Issue: antrea-io#712 * Connection store for flow exporter feature: Supporting Pod-to-Service flows. Eliminating duplicate flows with kube-proxy. * Connection store patch follow-up Added feature switch and addressed comments. Added logic to enable netfilter.conntrack_acct setting. Swapped names of connTrackPoller and connTrack to be more appropriate. Addressed comments. Issue antrea-io#712 * Connection store patch: Addressed comments. Optimized the check of sysctl settings check. Addressed merge conflicts. * Update base antrea-agent.conf to provide info on feature gate * Addressed comments.
…a-io#773) * Create and maintain connection store for Antrea flows in conntrack Created connection store that stores the flows by polling conntrack module every 5s. cxnStore updates the flow if it is already there or add a new flow based on 5 tuple map. In addition, we add local pod information by querying interfaceStore. We added ipCache in interfaceStore for this purpose. Unit tests and testing on local setup is done with custom apps. Issue: antrea-io#712 * Connection store for flow exporter feature: Supporting Pod-to-Service flows. Eliminating duplicate flows with kube-proxy. * Connection store patch follow-up Added feature switch and addressed comments. Added logic to enable netfilter.conntrack_acct setting. Swapped names of connTrackPoller and connTrack to be more appropriate. Addressed comments. Issue antrea-io#712 * Connection store patch: Addressed comments. Optimized the check of sysctl settings check. Addressed merge conflicts. * Update base antrea-agent.conf to provide info on feature gate * Addressed comments.
Created connection store that stores the flows by polling conntrack module
every 5s. Connection Store updates the flow if it is already there or add a new
flow based on 5 tuple map. In addition, we add local pod information by
querying interfaceStore. Added get-by-ip function in
interfaceStore indexer for this purpose.
Unit tests and testing on local setup is done with sample apps.
Issue# 712