Back to Cilium

Installation on Broadcom VMware ESXi / NSX

Documentation/installation/k8s-install-broadcom-vmware-esxi-nsx.rst

1.19.33.6 KB
Original Source

.. only:: not (epub or latex or html)

WARNING: You are looking at unreleased Cilium documentation.
Please use the official rendered version released here:
https://docs.cilium.io

.. _k8s_install_broadcom_vmware_esxi_nsx:


Installation on Broadcom VMware ESXi / NSX


Cilium can be installed on VMware ESXi with or without NSX by using official image.

Deploying Cilium on Broadcom VMware vSphere ESXi with or without NSX(-T)

Cilium can be deployed on VMware vSphere ESXi, with or without NSX(-T). However, there are known issues when using tunnel mode with VXLAN as the encapsulation.

.. tabs::

.. group-tab:: VXLAN

    Install Cilium via ``helm install`` with VXLAN Protocol

    .. cilium-helm-install::
       :namespace: kube-system
       :set: image.pullPolicy=IfNotPresent
             ipam.mode=kubernetes
             tunnelProtocol=vxlan

    .. note::

        With NSX(-T), use a custom port for the ``tunnelPort`` flag, for instance ``--set tunnelPort=8223``. :gh-issue:`21801`
        tracks some reports of problems with offloads when using the VXLAN UDP port standard (4789) or draft (8472).

.. group-tab:: Geneve

    Install Cilium via ``helm install`` with Geneve Protocol

    .. cilium-helm-install::
       :namespace: kube-system
       :set: image.pullPolicy=IfNotPresent
             ipam.mode=kubernetes
             tunnelProtocol=geneve

    .. note::

        NSX(-T) with Network Virtualization (with Edge T0/T1) also uses Geneve Protocol between Transport Nodes (ESXi, Edge).
        Be aware when troubleshooting that the Geneve traffic you observe on the network may be generated by either NSX(-T) or Cilium.

Troubleshooting

Pod Communication Failure Across Hosts

When deploying Cilium with some old release ESXi (7) or with NSX-T (3.x/4.x), with VXLAN encapsulation, the inter-host pod communication may fail, except for ICMP (ping), which still functions.

In the :ref:Cilium-health status <cluster_connectivity_health> you will see:

.. code-block:: shell-session

==== detail from pod cilium-mvrb6 , on node alg-cilium-cp
Probe time:   2025-03-12T16:55:02Z
Nodes:
alg-cilium-cp (localhost):
    Host connectivity to 10.44.144.20:
    ICMP to stack:   OK, RTT=640.959µs
    HTTP to agent:   OK, RTT=148.15µs
    Endpoint connectivity to 10.42.0.38:
    ICMP to stack:   OK, RTT=632.181µs
    HTTP to agent:   OK, RTT=295.409µs
alg-cilium-wk1:
    Host connectivity to 10.44.144.21:
    ICMP to stack:   OK, RTT=764.463µs
    HTTP to agent:   OK, RTT=1.154573ms
    Endpoint connectivity to 10.42.4.211:
    ICMP to stack:   OK, RTT=765.081µs
    HTTP to agent:   Get "http://10.42.4.211:4240/hello": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

The problem originates from a bug in the VMXNET3 driver <https://knowledge.broadcom.com/external/article/324199/vm-vxlan-traffic-fails-on-a-host-prepare.html>__ related to NIC offload support for VXLAN encapsulation. This is due to the use of an outdated standard port (8472) for VXLAN.

In this case you need to change to VXLAN Port --set tunnelPort=8223 or use Geneve tunnel Protocol --set tunnelProtocol=geneve. There is some workaround about Disable NIC Offload <https://github.com/cilium/cilium/issues/21801>__ but it is not recommended solution.