import SupportOptions from "@/components/SupportOptions"; import NextStep from "@/components/NextStep"; import Alert from "@/components/DocsAlert"; import Link from "next/link"; import Image from "next/image";

Architecture: Critical Sequences

Firezone is a distributed system with many moving parts, but some parts are especially critical to the integrity of the entire system:

Authentication: User authentication, usually with an identity provider.
Policy evaluation: Deciding whether to allow or deny a connection request.
Detailed Connection Setup: How connections between Clients and Gateways are established.
DNS resolution: Resolving DNS-based Resources.
High availability: How Firezone achieves high availability through load balancing and automatic failover.

These will be explained in more detail below.

Authentication

Firezone authenticates users using two primary methods:

Email (OTP): Users receive a one-time password (OTP) via email.
OpenID Connect: Users authenticate with an identity provider that supports OpenID Connect (OIDC).

The authentication process for each is similar. Both methods begin the authentication process at your Firezone account's sign in page: https://app.firezone.dev/<your-account>.

However, the OIDC flow redirects the user to the identity provider for authentication before the final redirect back to Firezone.

Here's how the authentication flow works:

User clicks Sign in from the Client.
The Client generates random 32-byte state and nonce values. These are used to prevent certain kinds of forgery and injection attacks.
A browser window opens to your account's sign in page, https://app.firezone.dev/<your-account> containing the nonce and state parameters.
The user chooses which authentication method to use. If OIDC, the user is redirected out to the identity provider.
After successfully authenticating, the user is redirected back to the admin portal.
The admin portal mints a Firezone token created from the nonce parameter and other information.
The admin portal issues a final redirect to firezone-fd0020211111://handle_client_sign_in_callback with the token and state parameters from the initial request.
The Client receives this callback URL and validates the state parameter matches what it originally sent. This prevents other applications from injecting tokens into the Client's callback handler.
The Client saves this token in a platform-specific secure storage mechanism, for example Keychain on macOS and iOS.
The Client now has a valid token and uses it to authenticate with the control plane API.
The authentication process is complete.

Policy evaluation

Policy evaluation is the process the Policy Engine uses to decide whether to allow or deny a connection request from a Client to a Resource.

If the request is allowed, connection setup information is sent to the Client and the appropriate Gateway. If the request is denied, it's logged and then dropped. This ensures that Clients are only connected to Gateways that are serving Resources the User is allowed to access.

<Alert color="info"> Connections in Firezone are **always** default-deny. Policies must be created to allow access. </Alert>

Here's how the process works:

The User attempts to access a Resource, e.g. 10.10.10.10.
The Client sees the request and opens a connection request to the Policy Engine.
The Policy Engine evaluates the request against the configured Policies in your account based on factors such as the Groups the user is a part of, which Resource is being accessed, and so forth. If a match is found, the connection is allowed. If no match is found, the connection is dropped.
If the connection is allowed, the Policy Engine sends the Client the WireGuard keys and NAT traversal information for the Gateway that will serve the Resource.
The Policy Engine sends similar details to the Gateway.
The Client and Gateway establish a WireGuard tunnel, and the Gateway sets up a forwarding rule to the Resource.
Connection setup is complete. The User can now access the Resource.

Since the Client only receives WireGuard keys and NAT traversal information when a connection is allowed, it's not possible for a Client to exchange packets with the Gateway until explicitly allowed by the Policy Engine.

This means Gateways remain invisible to the outside world, helping to protect against classes of attacks that perimeter-based models may be susceptible to, such as DDoS attacks.

Detailed Connection Setup

The above "Policy evaluation" section touches on this topic briefly in step 6. This section here describes in more detail, how this connection is established.

Firezone supports NAT traversal which means that neither the Client nor the Gateway need to be exposed to the public Internet. Instead of having to open and forward a port on the NAT device, direct connections are established via a technique called "hole-punching". In case that fails, the connection falls back to using a TURN server. We operate TURN servers in every region offered by Azure to minimize the overhead, regardless of where you are in the world.

To establish connections, Firezone implements the Interactive Connectivity Establishment (ICE) RFC. ICE is essentially an algorithm where two peers that would like to connect to each other first perform what is called "candidate gathering". They then test these candidates and nominate the best one.

Once ICE is finished, the nominated candidate pair is used to handshake a WireGuard session, which then allows encrypted packets to be sent back and forth.

Phase 1: Candidate gathering

Candidates are socket addresses a peer can send data from and receive data on.

host-candidates: The most obvious kind of candidate. Directly corresponds to the IP and port of the local network interface of a listening socket.
server-reflexive candidates: These correspond to the IP and port that a remote (usually a STUN or TURN server) observes as the source address and port for packets sent from a peer. In other words, the public IP of e.g. your home router.
relay candidates: Relay candidates are sockets allocated on a TURN server. Traffic received on such a socket is forwarded to the owner of that socket. In Firezone, both Clients and Gateways immediately allocate such a socket with the two geographically closest TURN servers once you sign in, even before you establish any connections. This ensures we have our entire set of candidates ready by the time we want to create a connection.

Phase 2: Testing candidates

Once we have gathered all relevant candidates, connecting to another peer is as simple as exchanging and testing them. As we don't have a direct connection yet, this step is done via the Firezone control plane API. ICE then forms an NxM matrix of all candidates and starts testing them for connectivity. Testing a so-called candidate pair boils down to sending a UDP packet to the remote. If we receive an answer back for a certain candidate pair, the test is successful. After 12 attempts without a response, we consider the candidate pair failed. All successful candidate pairs are then ranked by priority such that direct connections are considered better than those involving a TURN server. The best candidate pair is then "nominated" and declared as the result of the ICE algorithm.

Phase 3: WireGuard handshake

WireGuard itself does not establish any connections and instead just represents a state machine that manages key rotation, encryption, and decryption of packets.

With the "nominated" candidate pair as the output of the ICE algorithm, we now have a pair of socket addresses for exchanging UDP packets between Client and Gateway. WireGuard's handshake requires prior knowledge of the remote's public key. Similar to the candidates, these keys have also been exchanged between Client and Gateway via the Firezone control protocol API. Using these public keys, Client and Gateway exchange secret session keys via Diffie-Hellman. These session keys are then used to encrypt packets and are rotated every 2 minutes.

Hole-punching

The somewhat magical aspect of hole-punching actually happens entirely implicitly as part of this process. Most NAT devices are inherently stateful in that they remember the source port of an outgoing packet and allow packets arriving at the same port back-in. Both peers in the above algorithm are forming the same NxM matrix of candidates, and hence are sending packets to the same socket address the other one is sending from.

For example, assume a Client's public IP is 35.10.10.10 and the Gateway's public IP is 40.10.10.10. Locally, both Firezone Clients and Firezone Gateways are listening on port 52625 by default. As part of the candidate gathering, they will resolve their respective public IP. The NxM matrix will therefore include a pair of 35.10.10.10:52625 <> 40.10.10.10:52625 on both sides.

Client sends a UDP packet from its local socket using source port 52625 towards 40.10.10.10:52625
Gateway sends a UDP packet from its local socket using source port 52625 towards 35.10.10.10.52625
NAT device on the Client's network rewrites the source of the outgoing packet to 35.10.10.10:52625. It also registers a new "connection" on this port.
NAT device on the Gateways's network rewrites the source of the outgoing packet to 40.10.10.10:52625. It also registers a new "connection" on this port.
Packet from the Client arrives at the Gateway's NAT device. The lookup by source address (35.10.10.10:52625) yields the connection created in step 4.
Packet from the Gateway arrives at the Client's NAT device. The lookup by source address (40.10.10.10:52625) yields the connection created in step 3.

Due to differing latencies, the timing of these steps can vary in practice. That isn't an issue though as UDP is by design stateless and the next packet will simply be allowed through.

Relayed connections

Relayed connections, i.e. those involving a TURN server work in a very similar way and are in fact transparent to the WireGuard handshake and the encrypted packets. For relayed connections, the output of ICE is still a candidate pair, except that the "local" side of the pair is the allocated socket on the TURN server. In other words, if a relayed candidate pair is nominated, then none of the candidate pairs involving host and server-reflexive candidates have been successful and thus a relay candidate turned out to be the one with the highest priority.

To send data through a TURN server, the sender constructs a channel-data packet. This packet is a lightweight wrapper consisting of a 4-byte header that identifies a previously allocated channel. Channel allocation is handled as part of the TURN client–server protocol. Each channel is bound to a single remote peer, and all packets sent on that channel are forwarded to that peer. When the TURN server receives a channel-data packet, it removes the header and forwards the remaining payload to the designated peer. To improve their efficiency and throughput, Firezone's TURN servers make use of eBPF eXpress data path (XDP) and implement this routing of channel-data packets directly in the kernel. Specifically, in the network card driver even before the packet gets parsed.

To the receiver of the packet, this process is transparent. They simply send and receive UDP traffic from an IP and port and outside of applying heuristics based on e.g. IP location, cannot differentiate between this socket being a relay or the remote peer directly.

DNS resolution

Secure DNS resolution is a critical function in most organizations.

Firezone employs a unique, granular approach to split DNS to ensure traffic intended only for DNS-based Resources is routed through Firezone, leaving other traffic untouched -- even when resolved IP addresses overlap.

To achieve this, Firezone embeds a tiny, in-memory DNS resolver in each Client that intercepts all DNS queries on the system.

When the resolver sees a query that doesn't match a known Resource, it operates in pass-through mode, forwarding the the query to the system's default resolvers or configured upstream resolvers in your account.

If the query matches a Resource, however, the following happens:

The resolver generates a special, internal IP from the range 100.96.0.0/11 or fd00:2021:1111:8000::/107 and stores an internal mapping of this IP to the DNS name originally queried. The IP is returned to the application that made the query.
When the Client sees traffic for the IP, it triggers connection setup. The Client sends a request to the Policy Engine for evaluation. If the request is allowed, the Policy Engine finds an appropriate Gateway to route the traffic.
If the Policy Engine approves the Client's request, it sends the Client the WireGuard keys and NAT traversal information for the Gateway that will serve the Resource.
Once the connection is established (and every time an application re-queries a DNS name), the Client sends a message using Firezone's p2p control protocol to the Gateway, triggering a DNS query using the Gateway's system resolver for the DNS name. The result of this DNS query is stored on the Gateway in a lookup table and cached for 30 seconds.
Traffic from the application flows, and the Gateway translates the internal IP back to the actual IP address of the Resource, forwarding the traffic accordingly.

This is why you'll see DNS-based Resources resolve to IPs such as 100.96.0.1 while the Client is signed in:

text

> nslookup github.com

Server: 100.100.111.1
Address: 100.100.111.1#53

Non-authoritative answer:

Name: github.com
Address: 100.96.0.1

Notice in the above process that at no point does the Client's system resolver see the actual IP address of the Resource. This ensures that your DNS data remains private and secure.

For a deeper dive into how (and why) DNS works this way in Firezone, see the How DNS works in Firezone article.

Why Firezone uses a mapped address for DNS Resources

This is a common source of confusion among new Firezone users, so it's helpful to explain why Firezone uses mapped IPs for DNS Resources instead of simply using the actual resolved IP.

Consider the case where two DNS Resources resolve to the same IP address, such as when Name-based virtual hosting is used to host two web applications on the same server:

gitlab.company.com resolves to IP 172.16.0.5
jenkins.company.com also resolves to IP 172.16.0.5

Remember that routing happens at the IP level. We can't independently route packets for the same IP to two different places. If Firezone used the Resource's actual IP address to route packets, the User would be able to access jenkins.company.com if they were granted access only to gitlab.company.com.

Using mapped IPs allows Firezone to securely route DNS Resources no matter how many other services share the same IP address.

High availability

Firezone was designed from the ground up to support high availability requirements. This is achieved through a combination of load balancing and automatic failover, described below.

Load balancing

When a Client wants to connect to a Resource, Firezone automatically selects a healthy Gateway in the Site to handle the request based on the Client's geolocated IP address. The system calculates the geographic distance to each available Gateway and selects the one that is closest to the Client's location. This ensures optimal performance with the lowest possible latency.

The Client maintains the connection to that Gateway until either the Client disconnects or the Gateway becomes unhealthy.

This effectively shards Client connections across all Gateways in a Site, achieving higher overall throughput than otherwise possible with a single Gateway.

Automatic failover

Two or more Gateways deployed within a Site provide automatic failover in the event of a Gateway failure.

Here's how it works:

When the admin portal detects a particular Gateway is unhealthy, it will stop using it for new connection requests to Resources in the Site.
Existing Clients will remain connected to the Gateway until they themselves detect it to be unhealthy.
Clients identify unhealthy gateways using keepalive timers. If the timer expires, the Client will disconnect from the unhealthy Gateway and request a new, healthy one from the portal.
The Client keepalive timer expires after 10 seconds. This is the maximum time it takes for existing Client connections to be rerouted to a healthy Gateway in the event of a Gateway failure.

By using two independent health checks in the portal and the Client, Firezone ensures that temporary network issues between the Client and portal do not interrupt existing connections to healthy Gateways.

{(<NextStep href="/kb/architecture/security-controls">Next: Security controls</NextStep>)}