Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create and mainatain connection store for Antrea flow exporter #773

Merged
merged 6 commits into from
Jul 8, 2020

Conversation

srikartati
Copy link
Member

@srikartati srikartati commented Jun 1, 2020

Created connection store that stores the flows by polling conntrack module
every 5s. Connection Store updates the flow if it is already there or add a new
flow based on 5 tuple map. In addition, we add local pod information by
querying interfaceStore. Added get-by-ip function in
interfaceStore indexer for this purpose.

Unit tests and testing on local setup is done with sample apps.

Issue# 712

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-all: to trigger all tests.
  • /skip-all: to skip all tests.

These commands can only be run by members of the vmware-tanzu organization.

@srikartati srikartati force-pushed the netviz branch 2 times, most recently from b48a865 to b2754d2 Compare June 1, 2020 23:36
@srikartati srikartati changed the title Create and mainatain connection store for Antrea flows in conntrack module Create and mainatain connection store for Antrea flow exporter Jun 1, 2020
@srikartati srikartati force-pushed the netviz branch 2 times, most recently from 40915da to 48a0e9b Compare June 2, 2020 08:14
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Some quick comments.

pkg/agent/interfacestore/interface_cache.go Outdated Show resolved Hide resolved
@@ -0,0 +1,5 @@
// +build windows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not check all code yet, but seems a little strange to add types_windows.go just for an empty struct. Could we use a shared struct between Windows and Linux, or move the struct to other files like conntrack_linux/windows.go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using separate struct because there is dependency on unix netlink in the conntrack library we are using. I kept it outside to avoid cyclic dependency in testing mocks package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jianjuns , I removed the dependency on conntrack library for Connection struct. Thereby I could refactor lot of code--removed multiple windows and linux files. Was able to define a common interface and have separate implementations based on OS. Please take a look.

@@ -0,0 +1,17 @@
// +build windows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need _windows.go for all these files? Could limit the funcs referenced outside in a single file, and so we just need a single _windows.go file?

Copy link
Member Author

@srikartati srikartati Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved everything into one file. Having interface in common file and defining functions in corresponding linux/windows got tricky because of dependency on linux specific conntrack library.

@srikartati srikartati self-assigned this Jun 3, 2020
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a plan for Windows support? If a key part of the Antrea messaging is going to be unified dataplane and Windows as a first-class citizen, we don't want to have to much of a gap between the 2 platforms, and let that gap grow. @McCodeman

I brought up ovs-dpctl dump-conntrack before and I think it is worth investigating if it can provide a unified collection mechanism for:

  • Windows & Linux nodes
  • all OVS datapath types, including netdev
  • connections offloaded to hardware (which may not appear in conntrack?)

@srikartati do you know how ovs-dpctl dump-conntrack compares to polling the conntrack module with netlink (e.g. in terms of quality of the information we can retrieve)?

@@ -375,6 +375,8 @@ data:

# Enable metrics exposure via Prometheus. Initializes Prometheus metrics listener.
#enablePrometheusMetrics: false
# Enable flow exporter that exports IPFIX flow records for Antrea flows from conntrack module.
#enableFlowExporter: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how confident are we in this configuration parameter? what are we exporting the records to (how to configure the destinations)?.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the support for sending IPFIX records to collector is added, we will have flowCollector config parameter that takes in IP+port. This could be redundant then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should extend the comment to specify that this config parameter may go away in the future? @jianjuns

)

const (
ctZone uint16 = 65520
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid hardcoding the ctZone in multiple different places (already defined in pkg/agent/openflow/pipeline.go)?

@srikartati srikartati force-pushed the netviz branch 4 times, most recently from a9d2c0f to d132a15 Compare June 4, 2020 19:49
Copy link
Member Author

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.

@@ -0,0 +1,5 @@
// +build windows
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using separate struct because there is dependency on unix netlink in the conntrack library we are using. I kept it outside to avoid cyclic dependency in testing mocks package.

@@ -0,0 +1,17 @@
// +build windows
Copy link
Member Author

@srikartati srikartati Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved everything into one file. Having interface in common file and defining functions in corresponding linux/windows got tricky because of dependency on linux specific conntrack library.

@srikartati
Copy link
Member Author

Do we have a plan for Windows support? If a key part of the Antrea messaging is going to be unified dataplane and Windows as a first-class citizen, we don't want to have to much of a gap between the 2 platforms, and let that gap grow. @McCodeman

I brought up ovs-dpctl dump-conntrack before and I think it is worth investigating if it can provide a unified collection mechanism for:

  • Windows & Linux nodes
  • all OVS datapath types, including netdev
  • connections offloaded to hardware (which may not appear in conntrack?)

@srikartati do you know how ovs-dpctl dump-conntrack compares to polling the conntrack module with netlink (e.g. in terms of quality of the information we can retrieve)?

That is definitely a good suggestion of using ovs-dpctl dump-conntrack. As you noted, the main advantage of supporting multiple types of OS and OVS switches.
As far as I know, both netlink and ovs-dpctl dump-conntrack have similar information and output looked the same to naked eye. At least for basic flow fields that we look at, we can expect the same.

Can ovs-dpctl dump-conntrack dump the conntrack flows maintained in hardware in the case of OVS SRIOV offload?

I see a couple of cons:

  • As you know, we need to run this as an executable and parse the output in antrea agent code to build the flows. Complexity increases as number of flows increase and overhead could be costly. Where as netlink conn dumps conntrack flows and gets fields through netlink attributes.
  • Maintenance may become costly as we have to rely on output format of ovs-dpctl dump-conntrack

@jianjuns
Copy link
Contributor

jianjuns commented Jun 4, 2020

I agree we should reduce the feature gap between Linux and Windows, but on the other hand I believe Linux is still more important, so I am fine that we move faster on Linux (rather than blocking new features on Linux to wait for the Windows implementation ready).
For the ovs-dpctl discussion, do we have an idea what will be the API to get connection info in Windows and other OVS datapath types? Does OVS provide some API/interface to read the connection info?

@antoninbas
Copy link
Contributor

@jianjuns it seems we would need to invoke the executable and parse the output, but there may be an API I am not familiar with.

@moshe010 can probable tell us if ovs-dpctl dump-conntrack will keep working when offloading the datapath to HW.

If it is a truly portable solution, it may be the best choice regardless of the limitation of having to call the executable.

@srikartati srikartati force-pushed the netviz branch 2 times, most recently from 8b4c989 to c04c6c5 Compare June 4, 2020 22:16
@antoninbas
Copy link
Contributor

@moshe010 has confirmed to me that ovs-dpctl dump-conntrack would work when the kernel datapath is offloaded to HW. However, since it seems that packets that cause TCP state transitions still go through the kernel stack, polling conntrack with netlink should also work for that case at least.

@jianjuns
Copy link
Contributor

jianjuns commented Jun 8, 2020

Do you think we can use netlink to be the default implementation, and support dpctl or other ways when netlink does not work?

@srikartati
Copy link
Member Author

Do you think we can use netlink to be the default implementation, and support dpctl or other ways when netlink does not work?

I have the same question. When netlink is not supported for conntrack, can we resort to ovs-dpctl rather than using ovs-dpctl for everything?

@antoninbas
Copy link
Contributor

@jianjuns @srikartati sounds good to me, but I do recommend trying to keep Windows support on par with Linux as much as possible

@srikartati srikartati force-pushed the netviz branch 4 times, most recently from 8420ae3 to f7f602d Compare June 9, 2020 03:38
@srikartati srikartati changed the title Create and mainatain connection store for Antrea flow exporter [WIP] Create and mainatain connection store for Antrea flow exporter Jun 9, 2020
@antoninbas
Copy link
Contributor

/test-all

@srikartati srikartati force-pushed the netviz branch 5 times, most recently from 0208896 to d0edf77 Compare July 2, 2020 19:24
@srikartati
Copy link
Member Author

/test-all

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remember to add integration tests for the connection dumper on Linux in a future PR.

@@ -103,6 +103,9 @@ func (o *Options) validate(args []string) error {
if encapMode.SupportsNoEncap() && o.config.EnableIPSecTunnel {
return fmt.Errorf("IPSec tunnel may only be enabled on %s mode", config.TrafficEncapModeEncap)
}
if o.config.OVSDatapathType == ovsconfig.OVSDatapathNetdev && features.DefaultFeatureGate.Enabled(features.FlowExporter) {
return fmt.Errorf("Flow exporter is not supported for OVS datapath type %s", o.config.OVSDatapathType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we do not capitalize error messages in fmt.Errorf

}

// Run polls the connTrackDumper module periodically to get connections. These connections are used
// to build connection store. If there is an error in poll cycle, we break the loop and exit the routine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence no longer applies after update to the error handling below.

If there is an error in poll cycle, we break the loop and exit the routine.

Comment on lines 111 to 124
func NewConnTrackInterfacer() *connTrackSystem {
// Check value of setting net.netfilter.nf_conntrack_acct. Set to value 1, if it is not set.
connTrackAcct, err := sysctl.GetSysctlNet("netfilter/nf_conntrack_acct")
if err != nil {
// Continue with creation of interfacer object as we can dump flows with no stats and that information can still be useful.
// If permission error, please provide access to net.netfilter.nf_conntrack_acct. This will enable flow exporter to export stats and timestamps of connections.
klog.Errorf("Error when getting net.netfilter.nf_conntrack_acct: %v", err)
} else {
if connTrackAcct == 0 {
err = sysctl.SetSysctlNet("netfilter/nf_conntrack_acct", 1)
if err != nil {
// If permission error, please provide access to net.netfilter.nf_conntrack_acct.
klog.Errorf("Error when setting net.netfilter.nf_conntrack_acct: %v", err)
}
// Set the conntrack timestamp setting to get timestamps of connections
err = sysctl.SetSysctlNet("netfilter/nf_conntrack_timestamp", 1)
if err != nil {
klog.Errorf("Error when setting net.netfilter.nf_conntrack_timestamp: %v", err)
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new code is organized a bit strangely I think.
Could we define an ensureNfConntrackFlag(flagName string, value int) function, and then call it like this here:

ensureNfConntrackFlag("nf_conntrack_acct", 1)
ensureNfConntrackFlag("nf_conntrack_timestamp", 1)

You used to have some special handling for permission errors but now you just have these comments: "If permission error, please provide access to net.netfilter.nf_conntrack_acct". I find it a bit confusing. Shouldn't we have this in a log message so the user can see it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will change the organization accordingly.
@permisssion error: I realized that error already says permission access issue, so I removed from log. In addition, for other errors, I felt we do not need to return error as the rest of the flow data other than stats and timestamp is still useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered both conntrack settings to be independent in latest commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks better, but I stand by my earlier comment: since you do the same thing for both settings, why not define a helper function (e.g. ensureNfConntrackFlag) which takes the settings name as a parameter?

Copy link
Member Author

@srikartati srikartati Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion here. Added helper function.
In prior way comments in the code for each setting are right before the error logs; felt that would be easier to read rather than having them in the caller of helper func.

Created connection store that stores the flows by polling conntrack module
every 5s. cxnStore updates the flow if it is already there or add a new
flow based on 5 tuple map. In addition, we add local pod information by
querying interfaceStore. We added ipCache in
interfaceStore for this purpose.

Unit tests and testing on local setup is done with custom apps.

Issue: antrea-io#712
Supporting Pod-to-Service flows. Eliminating duplicate flows with
kube-proxy.
Added feature switch and addressed comments.
Added logic to enable netfilter.conntrack_acct setting.
Swapped names of connTrackPoller and connTrack to be more appropriate.

Addressed comments.

Issue antrea-io#712
Addressed comments. Optimized the check of sysctl settings check.
Addressed merge conflicts.
@srikartati srikartati force-pushed the netviz branch 3 times, most recently from 9d96938 to c4d80b0 Compare July 8, 2020 19:02
@srikartati
Copy link
Member Author

/test-all

@srikartati
Copy link
Member Author

/test-windows-conformance
/test-windows-networkpolicy

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making the changes

@srikartati
Copy link
Member Author

LGTM, thanks for making the changes

Thanks for the review.

@srikartati
Copy link
Member Author

/test-windows-conformance
/test-windows-networkpolicy

@antoninbas
Copy link
Contributor

/test-windows-conformance

@antoninbas
Copy link
Contributor

/test-windows-networkpolicy

1 similar comment
@abhiraut
Copy link
Contributor

abhiraut commented Jul 8, 2020

/test-windows-networkpolicy

@srikartati srikartati merged commit b3ca03e into antrea-io:master Jul 8, 2020
GraysonWu pushed a commit to GraysonWu/antrea that referenced this pull request Sep 22, 2020
…a-io#773)

* Create and maintain connection store for Antrea flows in conntrack

Created connection store that stores the flows by polling conntrack module
every 5s. cxnStore updates the flow if it is already there or add a new
flow based on 5 tuple map. In addition, we add local pod information by
querying interfaceStore. We added ipCache in
interfaceStore for this purpose.

Unit tests and testing on local setup is done with custom apps.

Issue: antrea-io#712

* Connection store for flow exporter feature:

Supporting Pod-to-Service flows. Eliminating duplicate flows with
kube-proxy.

* Connection store patch follow-up

Added feature switch and addressed comments.
Added logic to enable netfilter.conntrack_acct setting.
Swapped names of connTrackPoller and connTrack to be more appropriate.

Addressed comments.

Issue antrea-io#712

* Connection store patch:

Addressed comments. Optimized the check of sysctl settings check.
Addressed merge conflicts.

* Update base antrea-agent.conf to provide info on feature gate

* Addressed comments.
GraysonWu pushed a commit to GraysonWu/antrea that referenced this pull request Sep 23, 2020
…a-io#773)

* Create and maintain connection store for Antrea flows in conntrack

Created connection store that stores the flows by polling conntrack module
every 5s. cxnStore updates the flow if it is already there or add a new
flow based on 5 tuple map. In addition, we add local pod information by
querying interfaceStore. We added ipCache in
interfaceStore for this purpose.

Unit tests and testing on local setup is done with custom apps.

Issue: antrea-io#712

* Connection store for flow exporter feature:

Supporting Pod-to-Service flows. Eliminating duplicate flows with
kube-proxy.

* Connection store patch follow-up

Added feature switch and addressed comments.
Added logic to enable netfilter.conntrack_acct setting.
Swapped names of connTrackPoller and connTrack to be more appropriate.

Addressed comments.

Issue antrea-io#712

* Connection store patch:

Addressed comments. Optimized the check of sysctl settings check.
Addressed merge conflicts.

* Update base antrea-agent.conf to provide info on feature gate

* Addressed comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants