MLB-0001: BGP FRR enablement

Summary

The purpose of this enhancement is to use Free Range Routing (FRR) as an alternative BGP implementation in MetalLB. When directed to, MetalLB will publish prefixes via FRR rather than MetalLB’s current built-in BGP implementation.

Motivation

The motivation for this enhancement is to provide an alternative production-ready BGP implementation for use in MetalLB. Overall, this should reduce the effort for adding additional features to the MetalLB project. For example, there are a number of issues in the current backlog that may be addressed by using FRR. Notably:

FRR is a mature Linux Foundation routing protocol suite based on Quagga that has been used in many production deployments. As such, it has been proven in terms of its maturity, flexibility (as can be seen by the broad range of features it supports), scalability, security, reliability and performance. It also provides detailed logging features to aid debugging.

Goals

Provide a configuration option to allow use of alternate BGP implementations during MetalLB deployment.
Ensure feature parity with current BGP implementation in order to enable the eventual retirement of the current native BGP implementation.
Reinstate thorough BGP testing infrastructure.
Add additional documentation to indicate how to troubleshoot and debug any issues with FRR.

Non-Goals

Expose direct configuration of FRR to the MetalLB user.
Allow inter-mixing of different BGP implementation types within the same cluster.
Provide a method to integrate arbitrary BGP implementations into MetalLB.
Support runtime changing of BGP implementation type.
Extend the functionality of the MetalLB BGP implementation in order to take advantage of the additional capabilities of FRR.
Allow another actor to interact with FRR when in use by MetalLB.

Proposal

User Stories

Story 1

As a developer, I want to be able to run a comprehensive set of BGP system tests, preferably based on Kubernetes e2e tests, in order to ensure the MetalLB BGP implementation works as expected.

Story 2

As a cluster administrator, I want to be able to specify a configuration option when deploying MetalLB in order to select the MetalLB BGP implementation to be used by the MetalLB speaker.

Story 3

As a developer, I want to be able to configure FRR from MetalLB using FRR Go bindings in order to improve performance, reliability and debugging capabilities.

Risks and Mitigations

FRR does not provide documented northbound Go bindings to allow configuration of BGP. There is an experimental gRPC interface. This interface may need to be productionized through the FRR community. When this has been satisfactorily achieved, we can start Story 3.

Until that time, FRR can be configured declaratively by modifying its config file and reloading it when the configuration changes.

Design Details

FRR Integration Design Proposal

The current design assumes one BGP implementation. This proposal proposes allowing for the selection of the BGP implementation type via a global configuration option that is passed at initialization of the speaker application. This BGP implementation type would be used for all BGP connections in the cluster. This should allow for the integration of the FRR BGP implementation and potentially other BGP implementation (if required by other organizations). The intention is to modify MetalLB primarily at or below the session interface (some small changes may be required above that interface) to minimize impact of this integration and maximize reuse of common infrastructure.

session creation

In the current implementation, new BGP sessions are established by calling newBGP() in the main package which calls the bgp.New() function which in turn returns a session interface. This would be modified to create sessions based on BGP implementation type, essentially creating a session factory that would return the correct session type based on the configured BGP implementation.


┌───────────────┐
│               │
│    config     ├──┐
│ <<ConfigMap>> │  │                                         ┌───────────────┐
│               │  │reconcile                                │               │
└───────────────┘  │                                       ┌─┤  speakerlist  │
                   │                                       │ │  <<package>>  │
                   │                                       │ │               │
┌───────────────┐  │ ┌───────────────┐  ┌───────────────┐  │ └───────────────┘
│               │  │ │               │  │               │  │
│   services    ├──┼─┤      k8s      ├──┤     main      ├──┤
│  <<Service>>  │  │ │  <<package>>  │  │  <<package>>  │  │
│               │  │ │               │  │               │  │
└───────────────┘  │ └───────────────┘  └───────┬───────┘  │
                   │                    <<use>> │          │ ┌───────────────┐
                   │                            │          │ │               │
┌───────────────┐  │reconcile           ┌───────▼───────┐  │ │    config     │
│               │  │                    │               │  └─┤  <<package>>  │
│     nodes     │  │                    │   Protocol    │    │               │
│    <<Node>>   ├──┘                    │ <<interface>> │    └───────────────┘
│               │                       │               │
└───────────────┘                       └───────▲───────┘
                                                │<<implements>>
                                     ┌──────────┴─────────┐
                                     │                    │
                          ┌──────────┴────────┐  ┌────────┴──────┐
                          │       main::      │  │     main::    │
                          │ layer2_controller │  │ bgp_controller│
                          │     <<class>>     │  │   <<class>>   │
                          │                   │  │               │
                          └───────────────────┘  └───────┬───────┘
                                                         │
                                                         │  <<use>>
                                                 ┌───────▼───────┐
                                                 │               │
                                                 │    session    │
                                                 │ <<interface>> │
                                                 │               │
                                                 └───────▲───────┘
                                                         │ <<implement>>
                                             ┌───────────┴─────────────┐
                                             │                         │
                                     ┌───────┴───────┐         ┌───────┴───────┐
                                     │               │         │               │
                                     │  bgp_metallb  │         │    bgp_frr    │
                                     │  <<package>>  │         │  <<package>>  │
                                     │               │         │               │
                                     └───────────────┘         └───────┬───────┘
                                                                       │
                                                                       │
                                                               ┌───────┴───────┐
                                                               │               │
                                                               │      frr      │
                                                               │ <<container>> │
                                                               │               │
                                                               └───────────────┘

FRR deployment

FRR will be deployed in a container as part of the speaker Pod. This will simplify the deployment for end users as MetalLB will not need to manage another FRR Pod or deal with inter-Pod communication.

Deploying FRR as a separate Pod was also considered and it would give some advantages, such as the removal of the requirement for host networking from the speaker Pod and the separation of the FRR and speaker Pod lifecycles, but would add complexity to the end user.

Initially, control of the FRR container will be achieved by declaratively editing and reloading the FRR configuration file.

This configuration interface will be used by MetalLB to implement bgp.New() and bgp.Close() for session creation and bgp.Set() for prefix advertisement.

Reuse of components common to BGP implementations

There may be some common functionality in the bgp package that could be reused between BGP implementations. For example, the new bgp package will require integration with the “metrics” struct which updates Prometheus in response to BGP events. It may be necessary to refactor these types of functionality into a separate package for reuse amongst BGP implementations.

FRR Upgrades

We will allow independent upgrade of the FRR component in order to resolve any specific FRR bugs without upgrading MetalLB. MetalLB will ship with a default FRR version but it will be possible to configure the version somehow (e.g. through Helm).

Speaker Pods can be restarted for upgrades. However, Layer 2 memberlist code will see a node leave and rejoin the cluster. In BGP mode, these things can also happen (in particular "Connection reset by peer") and can be mitigated.

Upgrade/downgrade will remove a node by unlabelling the node. At this point the ‘speaker’ component on the node can be stopped and restarted with the desired version of FRR. It should be noted that as FRR only peers with peers outside the cluster, there is no requirement that each node within the cluster maintains the same version or a compatible version of FRR. However, it is required that each version of FRR is compatible with the BGP peer with which it is peering. After the ‘speaker’ component is restarted, the node can be labelled again.

Test Plan

e2e tests: The intention is to expand on the MetalLB ‘dev-env’ KIND environment. After deploying this environment, end-to-end tests will be run against this test cluster. Further investigation will be required as part of Story 1 in order to determine the possibility of reusing some of the code from the Kubernetes e2e tests. This work has begun here. unit tests: Unit tests will be added to any additional code in the MetalLB repository. upgrade: We may need to add tests to deal with an upgrade to a newer FRR version.

Version Skew Strategy

Version skew between versions of FRR should not be a concern to Kubernetes or MetalLB as long as FRR presents a stable interface to the MetalLB ‘speaker’ component. This is because FRR instances will only peer with BGP peers outside of the cluster and not with each other.

Drawbacks

This design would couple MetalLB with the FRR community, making MetalLB somewhat dependent on the FRR community for bug fixes and further feature improvements.
As FRR has a larger code base and configuration space, it will have a larger security attack surface relative to that of the current native BGP implementation of MetalLB. This could be mitigated by hardening of the configuration, for example using peer filtering and MD5 passwords.

Alternatives

In terms of enhancing the MetalLB native BGP implementation, an alternative would be to invest more effort in adding additional features.
In terms of enabling another BGP implementation, there may be other approaches that could be investigated but from the current investigation, this seems the least intrusive.

BGP Implementation Alternatives

A number of alternative open-source routing stacks were considered (FRR, BIRD, GoBGP) as a first target for integration. They were evaluated across a number of categories. GoBGP was discounted due to its relatively limited feature set. For example, it does not support BFD. FRR and BIRD are well-known and mature stacks which have been deployed in production and have active development communities. FRR was selected as the first target for integration for the following reasons:

Integration options: Both BIRD and FRR can be configured by MetalLB using the CLI and/or configuration files. However, FRR also has the option to be configured with a gRPC interface which can generate Go bindings.
Extensibility options: FRR provides additional extensibility via its dplane and FPM interfaces. This may ease integration with Kubernetes networking providers. There seem to be no equivalents in BIRD, which makes BIRD less extensible. This extensibility would enable FRR to integrate with dataplanes other than the Linux kernel. Examples: 1) hardware dataplanes 2) an Open Flow configured dataplane like Open vSwitch 3) DPDK dataplane.
Features: FRR provides support for some additional multiprotocol extensions such as l2vpn-evpn (RFC 7432 / RFC 8365) which may be applicable for some use cases and were not available in BIRD at this time. FRR also provides support for additional routing protocols such as IS-IS which are not supported by BIRD at this time. OSPF is one example of an additional protocol requested by the MetalLB community that is provided by BIRD and FRR.
Licensing: BIRD is licensed under GPL2 whereas FRR is dual-licensed under GPL2 and LGPL2.1. LGPL2.1 will give us more flexibility in terms of how we link with the FRR application.

It should be noted that, although this enhancement deals with FRR, it will provide a template and a standard interface to ease integration of other implementations (such as BIRD) in the future.

Development Phases

As this enhancement leads to the eventual retirement of the current native BGP implementation in MetalLB, a phased implementation plan consisting of 4 releases is proposed:

Development of the integration of FRR begins. No further feature development takes place on the native BGP implementation in MetalLB.
FRR is integrated as an alternative BGP implementation in MetalLB allowing a period of time (one release) for users to try out the feature in order to propose improvements. In this release, FRR is effectively in beta and is subject to change. FRR has feature parity with the current native BGP implementation but does not add any additional features.
Depending on the successful outcome of (2), a deprecation notice is raised on the MetalLB website, in the repository, and on the Slack channel indicating that the native BGP implementation will be deprecated and no feature development will take place after a certain timeframe (one release). Bug fixes are still supported in this time frame but after this time frame, users will be required to migrate to the FRR deployment or continue using an older version of MetalLB. In this release, FRR is a production release and as such both BGP implementations will be simultaneously supported. FRR has feature parity with the current native BGP implementation but does not add any additional features.
Depending on the successful outcome of (3), the native BGP implementation in MetalLB is deprecated from this point onwards. FRR is no longer limitied by feature parity with the current native implementation of BGP and additional functionality may be developed.

Test Coverage

The current e2e tests set up a bgp peer to all the nodes inside an FRR container, and try to hit the service.

In order to guarantee parity between the two implementations, the coverage of the e2e tests must be extended.

Test strategy

Every time we test an exposed service, we must verify that:

the service is reachable from outside
the nodes advertising the service ip are the expected ones
all the nodes are paired with the expected bgp peers

Changing configuration

Every time the configuration is changed before testing the new scenario, we may need to wait an arbitrary time or to run the checks in an eventually loop, since there is no feedback of what configuration is being consumed.

Another possible option is to remove the configuration completely and verify that hitting the service does not work anymore. This should ensure that it's possible to apply a new configuration.

Coverage

We must cover the following scenarios:

Cluster Traffic Policy
Local Traffic Policy
One Peer
Multiple Peers
Configuration Change (adding / removing new pools, adding / removing bgp peers)
Failover: killing one node and see that we are still able to reach the service
Different types of ranges (i.e. /24 CIDRs, /32 ranges)

BGP Parameters

We need to ensure that the BGP parameters are correctly received by the peer.

The following parameters must be covered:

node selector
source address
router-id
password (this may require another instance of frr / changing its configuration)
communities
local preferences
aggregation-length

Metrics

The current set of metrics must be covered by tests, in order to ensure that the new implementation does not regress from that perspective.