Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix iptables/nftables issue #3

Merged

Conversation

sridhargaddam
Copy link
Member

Both iptables and nftables use netfilter framework in the kernel for
packet filtering. Many distributions are moving in the direction of
using nftables over iptables. Although, nftables uses a new command
line utility (named nft), starting from iptables >=1.8, it uses
nftables under the hood while continuing to support the same iptables
syntax from the user.

Quoting from Dan's comment [#]

"In iptables 1.8, the maintainers have "deprecated" the classic ip_tables:
the iptables tool now does userspace translation from the legacy UI/UX,
and uses nf_tables under the hood. So, the commands look and feel the
same, but they're now programming a different kernel subsystem.

The problem arises when you mix and match invocations of iptables 1.6
(the previous stable) and 1.8 on the same machine, because although they
look identical, they're programming different kernel subsystems.

Empirically, this causes weird and wonderful things to happen - things
like if you trace a packet coming from a pod, you see it flowing through
both ip_tables and nf_tables, but even if both accept the packet, it then
vanishes entirely and never gets forwarded"

So, as long as we are programming either nf_tables or iptables, we would
not have any issues. Currently, there is no easy way to identify what type
of rules are programmed on the host. This patch follows the same approach
(as described here [*]) that is taken in OpenShift where the host file
system is mounted inside the docker container and iptables utility on the
host is exec'ed for programming any firewall rules.

[#] kubernetes/kubernetes#71305 (comment)
[*] kubernetes/kubernetes#71305 (comment)

Both iptables and nftables use netfilter framework in the kernel for
packet filtering. Many distributions are moving in the direction of
using nftables over iptables. Although, nftables uses a new command
line utility (named nft), starting from iptables >=1.8, it uses
nftables under the hood while continuing to support the same iptables
syntax from the user.

Quoting from Dan's comment [#]

"In iptables 1.8, the maintainers have "deprecated" the classic ip_tables:
the iptables tool now does userspace translation from the legacy UI/UX,
and uses nf_tables under the hood. So, the commands look and feel the
same, but they're now programming a different kernel subsystem.

The problem arises when you mix and match invocations of iptables 1.6
(the previous stable) and 1.8 on the same machine, because although they
look identical, they're programming different kernel subsystems.

Empirically, this causes weird and wonderful things to happen - things
like if you trace a packet coming from a pod, you see it flowing through
both ip_tables and nf_tables, but even if both accept the packet, it then
vanishes entirely and never gets forwarded"

So, as long as we are programming either nf_tables or iptables, we would
not have any issues. Currently, there is no easy way to identify what type
of rules are programmed on the host. This patch follows the same approach
(as described here [*]) that is taken in OpenShift where the host file
system is mounted inside the docker container and iptables utility on the
host is exec'ed for programming any firewall rules.

[#] kubernetes/kubernetes#71305 (comment)
[*] kubernetes/kubernetes#71305 (comment)
@sridhargaddam
Copy link
Member Author

JFYI, I verified this change both on ubi-minimal as well as Ubuntu images.

sridhargaddam added a commit to sridhargaddam/submariner that referenced this pull request Aug 27, 2019
Depends-On: submariner-io/submariner-charts#3

Both iptables and nftables use netfilter framework in the kernel for
packet filtering. Many distributions are moving in the direction of
using nftables over iptables. Although, nftables uses a new command
line utility (named nft), starting from iptables >=1.8, it uses
nftables under the hood while continuing to support the same iptables
syntax from the user.

Quoting from Dan's comment [#]

"In iptables 1.8, the maintainers have "deprecated" the classic ip_tables:
the iptables tool now does userspace translation from the legacy UI/UX,
and uses nf_tables under the hood. So, the commands look and feel the
same, but they're now programming a different kernel subsystem.

The problem arises when you mix and match invocations of iptables 1.6
(the previous stable) and 1.8 on the same machine, because although they
look identical, they're programming different kernel subsystems.

Empirically, this causes weird and wonderful things to happen - things
like if you trace a packet coming from a pod, you see it flowing through
both ip_tables and nf_tables, but even if both accept the packet, it then
vanishes entirely and never gets forwarded"

So, as long as we are programming either nf_tables or iptables, we would
not have any issues. Currently, there is no easy way to identify what type
of rules are programmed on the host. This patch follows the same approach
(as described here [*]) that is taken in OpenShift where the host file
system is mounted inside the docker container and iptables utility on the
host is exec'ed for programming any firewall rules.

[#] kubernetes/kubernetes#71305 (comment)
[*] kubernetes/kubernetes#71305 (comment)
sridhargaddam added a commit to sridhargaddam/submariner that referenced this pull request Aug 27, 2019
Depends-On: submariner-io/submariner-charts#3

Both iptables and nftables use netfilter framework in the kernel for
packet filtering. Many distributions are moving in the direction of
using nftables over iptables. Although, nftables uses a new command
line utility (named nft), starting from iptables >=1.8, it uses
nftables under the hood while continuing to support the same iptables
syntax from the user.

Quoting from Dan's comment [#]

"In iptables 1.8, the maintainers have "deprecated" the classic ip_tables:
the iptables tool now does userspace translation from the legacy UI/UX,
and uses nf_tables under the hood. So, the commands look and feel the
same, but they're now programming a different kernel subsystem.

The problem arises when you mix and match invocations of iptables 1.6
(the previous stable) and 1.8 on the same machine, because although they
look identical, they're programming different kernel subsystems.

Empirically, this causes weird and wonderful things to happen - things
like if you trace a packet coming from a pod, you see it flowing through
both ip_tables and nf_tables, but even if both accept the packet, it then
vanishes entirely and never gets forwarded"

So, as long as we are programming either nf_tables or iptables, we would
not have any issues. Currently, there is no easy way to identify what type
of rules are programmed on the host. This patch follows the same approach
(as described here [*]) that is taken in OpenShift where the host file
system is mounted inside the docker container and iptables utility on the
host is exec'ed for programming any firewall rules.

[#] kubernetes/kubernetes#71305 (comment)
[*] kubernetes/kubernetes#71305 (comment)
sridhargaddam added a commit to sridhargaddam/submariner that referenced this pull request Aug 27, 2019
Depends-On: submariner-io/submariner-charts#3

Both iptables and nftables use netfilter framework in the kernel for
packet filtering. Many distributions are moving in the direction of
using nftables over iptables. Although, nftables uses a new command
line utility (named nft), starting from iptables >=1.8, it uses
nftables under the hood while continuing to support the same iptables
syntax from the user.

Quoting from Dan's comment [#]

"In iptables 1.8, the maintainers have "deprecated" the classic ip_tables:
the iptables tool now does userspace translation from the legacy UI/UX,
and uses nf_tables under the hood. So, the commands look and feel the
same, but they're now programming a different kernel subsystem.

The problem arises when you mix and match invocations of iptables 1.6
(the previous stable) and 1.8 on the same machine, because although they
look identical, they're programming different kernel subsystems.

Empirically, this causes weird and wonderful things to happen - things
like if you trace a packet coming from a pod, you see it flowing through
both ip_tables and nf_tables, but even if both accept the packet, it then
vanishes entirely and never gets forwarded"

So, as long as we are programming either nf_tables or iptables, we would
not have any issues. Currently, there is no easy way to identify what type
of rules are programmed on the host. This patch follows the same approach
(as described here [*]) that is taken in OpenShift where the host file
system is mounted inside the docker container and iptables utility on the
host is exec'ed for programming any firewall rules.

[#] kubernetes/kubernetes#71305 (comment)
[*] kubernetes/kubernetes#71305 (comment)
sridhargaddam added a commit to sridhargaddam/submariner that referenced this pull request Sep 2, 2019
…ffic

As part of supporting Network policies and for ease of debugging, this
patch implements the following.

1. Creates VxLAN tunnels in the local Cluster between the worker nodes and
   the Cluster Gateway Node.
2. Programms the necessary iptable rules on the Cluster nodes to allow
   inter-cluster traffic.
3. This patch also avoids SNAT/MASQ for inter-cluster traffic, thereby
   preserving the original source ip of the POD all the way until the
   destination POD.
4. Programs the routing rules on the workerNodes to forward the remoteCluster
   traffic over the VxLAN interface that is created between the worker node
   and Cluster GatewayNode.

This patch depends on the following other patches

Depends-On: submariner-io#135
Depends-On: submariner-io/submariner-charts#3
Depends-On: submariner-io/submariner-charts#4
@mangelajo
Copy link
Contributor

Looks fine, but this has conflicts to resolve

@mangelajo
Copy link
Contributor

Ok, fixed.

@Oats87
Copy link
Member

Oats87 commented Sep 3, 2019

@sridhargaddam are there any SCC concerns we have due to this change? I'm pretty sure OCP's privileged allows for mounting hostPaths right?

@sridhargaddam
Copy link
Member Author

sridhargaddam commented Sep 3, 2019

@sridhargaddam are there any SCC concerns we have due to this change? I'm pretty sure OCP's privileged allows for mounting hostPaths right?

AFAICT, yes.

sridhargaddam added a commit to sridhargaddam/submariner that referenced this pull request Sep 3, 2019
As part of supporting Network policies and for ease of debugging, this
patch implements the following.

1. Creates VxLAN tunnels in the local Cluster between the worker nodes and
   the Cluster Gateway Node.
2. Programms the necessary iptable rules on the Cluster nodes to allow
   inter-cluster traffic.
3. This patch also avoids SNAT/MASQ for inter-cluster traffic, thereby
   preserving the original source ip of the POD all the way until the
   destination POD.
4. Programs the routing rules on the workerNodes to forward the remoteCluster
   traffic over the VxLAN interface that is created between the worker node
   and Cluster GatewayNode.

This patch depends on the following other patches

Depends-On: submariner-io#135
Depends-On: submariner-io/submariner-charts#3
Depends-On: submariner-io/submariner-charts#4
@sridhargaddam
Copy link
Member Author

Please note that to have a functional solution, we also need the patch from submariner-io
submariner-io/submariner#135

Both the patches should go together and we also have to host the updated charts in a publicly accessible repo. Currently the charts are hosted at rancher url https://releases.rancher.com/submariner-charts/latest

In KIND based setup, its referenced here https://github.com/submariner-io/submariner/blob/master/scripts/kind-e2e/e2e.sh#L59

CC: @mangelajo , @Oats87 , @skitt, @tpantelis, @dimaunx

@mangelajo mangelajo merged commit 4629754 into submariner-io:master Sep 5, 2019
sridhargaddam added a commit to sridhargaddam/submariner that referenced this pull request Sep 10, 2019
…ffic

As part of supporting Network policies and for ease of debugging, this
patch implements the following.

1. Creates VxLAN tunnels in the local Cluster between the worker nodes and
   the Cluster Gateway Node.
2. Programms the necessary iptable rules on the Cluster nodes to allow
   inter-cluster traffic.
3. This patch also avoids SNAT/MASQ for inter-cluster traffic, thereby
   preserving the original source ip of the POD all the way until the
   destination POD.
4. Programs the routing rules on the workerNodes to forward the remoteCluster
   traffic over the VxLAN interface that is created between the worker node
   and Cluster GatewayNode.

This patch depends on the following other patches

Depends-On: submariner-io#135
Depends-On: submariner-io/submariner-charts#3
Depends-On: submariner-io/submariner-charts#4
sridhargaddam added a commit to sridhargaddam/submariner that referenced this pull request Sep 10, 2019
As part of supporting Network policies and for ease of debugging, this
patch implements the following.

1. Creates VxLAN tunnels in the local Cluster between the worker nodes and
   the Cluster Gateway Node.
2. Programms the necessary iptable rules on the Cluster nodes to allow
   inter-cluster traffic.
3. This patch also avoids SNAT/MASQ for inter-cluster traffic, thereby
   preserving the original source ip of the POD all the way until the
   destination POD.
4. Programs the routing rules on the workerNodes to forward the remoteCluster
   traffic over the VxLAN interface that is created between the worker node
   and Cluster GatewayNode.

This patch depends on the following other patches

Depends-On: submariner-io#135
Depends-On: submariner-io/submariner-charts#3
Depends-On: submariner-io/submariner-charts#4
sridhargaddam added a commit to sridhargaddam/submariner that referenced this pull request Sep 13, 2019
…ffic

As part of supporting Network policies and for ease of debugging, this
patch implements the following.

1. Creates VxLAN tunnels in the local Cluster between the worker nodes and
   the Cluster Gateway Node.
2. Programms the necessary iptable rules on the Cluster nodes to allow
   inter-cluster traffic.
3. This patch also avoids SNAT/MASQ for inter-cluster traffic, thereby
   preserving the original source ip of the POD all the way until the
   destination POD.
4. Programs the routing rules on the workerNodes to forward the remoteCluster
   traffic over the VxLAN interface that is created between the worker node
   and Cluster GatewayNode.

This patch depends on the following other patches

Depends-On: submariner-io#135
Depends-On: submariner-io/submariner-charts#3
Depends-On: submariner-io/submariner-charts#4
sridhargaddam added a commit to sridhargaddam/submariner that referenced this pull request Sep 13, 2019
As part of supporting Network policies and for ease of debugging, this
patch implements the following.

1. Creates VxLAN tunnels in the local Cluster between the worker nodes and
   the Cluster Gateway Node.
2. Programms the necessary iptable rules on the Cluster nodes to allow
   inter-cluster traffic.
3. This patch also avoids SNAT/MASQ for inter-cluster traffic, thereby
   preserving the original source ip of the POD all the way until the
   destination POD.
4. Programs the routing rules on the workerNodes to forward the remoteCluster
   traffic over the VxLAN interface that is created between the worker node
   and Cluster GatewayNode.

This patch depends on the following other patches

Depends-On: submariner-io#135
Depends-On: submariner-io/submariner-charts#3
Depends-On: submariner-io/submariner-charts#4
novad03 pushed a commit to novad03/k8s-submariner that referenced this pull request Nov 25, 2023
Depends-On: submariner-io/submariner-charts#3

Both iptables and nftables use netfilter framework in the kernel for
packet filtering. Many distributions are moving in the direction of
using nftables over iptables. Although, nftables uses a new command
line utility (named nft), starting from iptables >=1.8, it uses
nftables under the hood while continuing to support the same iptables
syntax from the user.

Quoting from Dan's comment [#]

"In iptables 1.8, the maintainers have "deprecated" the classic ip_tables:
the iptables tool now does userspace translation from the legacy UI/UX,
and uses nf_tables under the hood. So, the commands look and feel the
same, but they're now programming a different kernel subsystem.

The problem arises when you mix and match invocations of iptables 1.6
(the previous stable) and 1.8 on the same machine, because although they
look identical, they're programming different kernel subsystems.

Empirically, this causes weird and wonderful things to happen - things
like if you trace a packet coming from a pod, you see it flowing through
both ip_tables and nf_tables, but even if both accept the packet, it then
vanishes entirely and never gets forwarded"

So, as long as we are programming either nf_tables or iptables, we would
not have any issues. Currently, there is no easy way to identify what type
of rules are programmed on the host. This patch follows the same approach
(as described here [*]) that is taken in OpenShift where the host file
system is mounted inside the docker container and iptables utility on the
host is exec'ed for programming any firewall rules.

[#] kubernetes/kubernetes#71305 (comment)
[*] kubernetes/kubernetes#71305 (comment)
novad03 pushed a commit to novad03/k8s-submariner that referenced this pull request Nov 25, 2023
…ffic

As part of supporting Network policies and for ease of debugging, this
patch implements the following.

1. Creates VxLAN tunnels in the local Cluster between the worker nodes and
   the Cluster Gateway Node.
2. Programms the necessary iptable rules on the Cluster nodes to allow
   inter-cluster traffic.
3. This patch also avoids SNAT/MASQ for inter-cluster traffic, thereby
   preserving the original source ip of the POD all the way until the
   destination POD.
4. Programs the routing rules on the workerNodes to forward the remoteCluster
   traffic over the VxLAN interface that is created between the worker node
   and Cluster GatewayNode.

This patch depends on the following other patches

Depends-On: submariner-io/submariner#135
Depends-On: submariner-io/submariner-charts#3
Depends-On: submariner-io/submariner-charts#4
novad03 pushed a commit to novad03/k8s-submariner that referenced this pull request Nov 25, 2023
As part of supporting Network policies and for ease of debugging, this
patch implements the following.

1. Creates VxLAN tunnels in the local Cluster between the worker nodes and
   the Cluster Gateway Node.
2. Programms the necessary iptable rules on the Cluster nodes to allow
   inter-cluster traffic.
3. This patch also avoids SNAT/MASQ for inter-cluster traffic, thereby
   preserving the original source ip of the POD all the way until the
   destination POD.
4. Programs the routing rules on the workerNodes to forward the remoteCluster
   traffic over the VxLAN interface that is created between the worker node
   and Cluster GatewayNode.

This patch depends on the following other patches

Depends-On: submariner-io/submariner#135
Depends-On: submariner-io/submariner-charts#3
Depends-On: submariner-io/submariner-charts#4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants