Documentation/adrs/add-nftables-implementation.md
Date: 2024-02-01
Writing
At the moment, flannel uses iptables to mask and route packets. Our implementation is based on the library from coreos (https://github.com/coreos/go-iptables).
There are several issues with using iptables in flannel:
References:
In flannel code, all references to iptables are wrapped in the iptables package.
The package provides the type IPTableRule to represent an individual rule. This type is almost entirely internal to the package so it would be easy to refactor the code to hide in favor of a more abstract type that would work for both iptables and nftables rules.
Unfortunately the package doesn't provide an interface so in order to provide both an iptables-based and an nftables-based implementation this needs to be refactored.
This package includes several Go interfaces (IPTables, IPTablesError) that are used for testing.
Ideally, flannel will include both iptables and nftables implementation. These need to coexist in the code but will be mutually exclusive at runtime.
The choice of which implementation to use will be triggered by an optional CLI flag. iptables will remain the default for the time being.
Using nftables is an opportunity for optimising the rules deployed by flannel but we need to be careful about retro-compatibility with the current backend.
Starting flannel in either mode should reset the other mode as best as possible to ensure that users don't need to reboot if they need to change mode.
Currently, flannel uses two dedicated tables for its own rules: FLANNEL-POSTRTG and FLANNEL-FWD.
FORWARD and POSTROUTING tables to direct traffic to its own tables.FLANNEL-POSTRTG are used to manage masquerading of the traffic to/from the podsFLANNEL-FWD are used to ensure that traffic to and from the flannel network can be forwardedWith nftables, flannel would have its own dedicated table (flannel) with arbitrary chains and rules as needed.
see https://wiki.nftables.org/wiki-nftables/index.php/Performing_Network_Address_Translation_(NAT)
# !! untested example
table flannel {
chain flannel-postrtg {
type nat hook postrouting priority 0;
# kube-proxy
meta mark 0x4000/0x4000 return
# don't NAT traffic within overlay network
ip saddr $pod_cidr ip daddr $cluster_cidr return
ip saddr $cluster_cidr ip daddr $pod_cidr return
# Prevent performing Masquerade on external traffic which arrives from a Node that owns the container/pod IP address
ip saddr != $pod_cidr ip daddr $cluster_cidr return
# NAT if it's not multicast traffic
ip saddr $cluster_cidr ip daddr != 224.0.0.0/4 nat
# Masquerade anything headed towards flannel from the host
ip saddr != $cluster_cidr ip daddr $cluster_cidr nat
}
chain flannel-fwd {
type filter hook input priority 0; policy drop;
# allow traffic to be forwarded if it is to or from the flannel network range
ip saddr flannelNetwork accept
ip daddr flannelNetwork accept
}
}
We can either:
nft executable directly