integration/README.md
Headscale's integration tests start a real Headscale server and run scenarios against real Tailscale clients across supported versions, all inside Docker. They are the safety net that keeps us honest about Tailscale protocol compatibility.
This file documents how to write integration tests. For how to
run them, see ../cmd/hi/README.md.
Tests live in files ending with _test.go; the framework lives in the
rest of this directory (scenario.go, tailscale.go, helpers, and the
hsic/, tsic/, dockertestutil/ packages).
For local runs, use cmd/hi:
go run ./cmd/hi doctor
go run ./cmd/hi run "TestPingAllByIP"
Alternatively, act runs the GitHub
Actions workflow locally:
act pull_request -W .github/workflows/test-integration.yaml
Each test runs as a separate workflow on GitHub Actions. To add a new
test, run go generate inside ../cmd/gh-action-integration-generator/
and commit the generated workflow file.
The integration framework has four layers:
scenario.go — Scenario orchestrates a test environment: a
Headscale server, one or more users, and a collection of Tailscale
clients. NewScenario(spec) returns a ready-to-use environment.hsic/ — "Headscale Integration Container": wraps a Headscale
server in Docker. Options for config, DB backend, DERP, OIDC, etc.tsic/ — "Tailscale Integration Container": wraps a single
Tailscale client. Options for version, hostname, auth method, etc.dockertestutil/ — low-level Docker helpers (networks, container
lifecycle, IsRunningInContainer() detection).Tests compose these pieces via ScenarioSpec and CreateHeadscaleEnv
rather than calling Docker directly.
IntegrationSkip(t)Every integration test function must call IntegrationSkip(t) as
its first statement. Without it, the test runs in the wrong environment
and fails with confusing errors.
func TestMyScenario(t *testing.T) {
IntegrationSkip(t)
// ... rest of the test
}
IntegrationSkip is defined in integration/scenario_test.go:15 and:
dockertestutil.IsRunningInContainer()),-short is passed to go test.The canonical setup creates users, clients, and the Headscale server in one shot:
func TestMyScenario(t *testing.T) {
IntegrationSkip(t)
t.Parallel()
spec := ScenarioSpec{
NodesPerUser: 2,
Users: []string{"alice", "bob"},
}
scenario, err := NewScenario(spec)
require.NoError(t, err)
defer scenario.ShutdownAssertNoPanics(t)
err = scenario.CreateHeadscaleEnv(
[]tsic.Option{tsic.WithSSH()},
hsic.WithTestName("myscenario"),
)
require.NoError(t, err)
allClients, err := scenario.ListTailscaleClients()
require.NoError(t, err)
headscale, err := scenario.Headscale()
require.NoError(t, err)
// ... assertions
}
Review scenario.go and hsic/options.go / tsic/options.go for the
full option set (DERP, OIDC, policy files, DB backend, ACL grants,
exit-node config, etc.).
EventuallyWithT patternIntegration tests operate on a distributed system with real async
propagation: clients advertise state, the server processes it, updates
stream to peers. Direct assertions after state changes fail
intermittently. Wrap external calls in assert.EventuallyWithT:
assert.EventuallyWithT(t, func(c *assert.CollectT) {
status, err := client.Status()
assert.NoError(c, err)
for _, peerKey := range status.Peers() {
peerStatus := status.Peer[peerKey]
requirePeerSubnetRoutesWithCollect(c, peerStatus, expectedRoutes)
}
}, 10*time.Second, 500*time.Millisecond, "client should see expected routes")
These read distributed state and may reflect stale data until propagation completes:
headscale.ListNodes()client.Status()client.Curl()client.Traceroute()client.Execute() when the command reads stateState-mutating commands run exactly once and either succeed or fail
immediately — not eventually. Wrapping them in EventuallyWithT hides
real failures behind retry.
Use client.MustStatus() when you only need an ID for a blocking call:
// CORRECT — mutation runs once
for _, client := range allClients {
status := client.MustStatus()
_, _, err := client.Execute([]string{
"tailscale", "set",
"--advertise-routes=" + expectedRoutes[string(status.Self.ID)],
})
require.NoErrorf(t, err, "failed to advertise route: %s", err)
}
Typical blocking operations: any tailscale set (routes, exit node,
accept-routes, ssh), node registration via the CLI, user creation via
gRPC.
One external call per EventuallyWithT block. Related assertions
on the result of a single call go together in the same block.
Loop exception: iterating over a collection of clients (or peers)
and calling Status() on each inside a single block is allowed — it
is the same logical "check all clients" operation. The rule applies
to distinct calls like ListNodes() + Status(), which must be
split into separate blocks.
Never nest EventuallyWithT calls. A nested retry loop
multiplies timing windows and makes failures impossible to diagnose.
Use *WithCollect helper variants inside the block. Regular
helpers use require and abort on the first failed assertion,
preventing retry.
Always provide a descriptive final message — it appears on failure and is your only clue about what the test was waiting for.
Variables used across multiple EventuallyWithT blocks must be declared
at function scope. Inside the block, assign with =, not := — :=
creates a shadow invisible to the outer scope:
var nodes []*v1.Node
var err error
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err = headscale.ListNodes() // = not :=
assert.NoError(c, err)
assert.Len(c, nodes, 2)
requireNodeRouteCountWithCollect(c, nodes[0], 2, 2, 2)
}, 10*time.Second, 500*time.Millisecond, "nodes should have expected routes")
// nodes is usable here because it was declared at function scope
Inside EventuallyWithT blocks, use the *WithCollect variants so
assertion failures restart the wait loop instead of failing the test
immediately:
requirePeerSubnetRoutesWithCollect(c, status, expected) —
integration/route_test.go:2941requireNodeRouteCountWithCollect(c, node, announced, approved, subnet) —
integration/route_test.go:2958assertTracerouteViaIPWithCollect(c, traceroute, ip) —
integration/route_test.go:2898When you write a new helper to be called inside EventuallyWithT, it
must accept *assert.CollectT as its first parameter, not *testing.T.
The order of headscale.ListNodes() is not stable. Tests that index
nodes[0] will break when node ordering changes. Look nodes up by ID,
hostname, or tag:
// WRONG — relies on array position
require.Len(t, nodes[0].GetAvailableRoutes(), 1)
// CORRECT — find the node that should have the route
expectedRoutes := map[string]string{"1": "10.33.0.0/16"}
for _, node := range nodes {
nodeIDStr := fmt.Sprintf("%d", node.GetId())
if route, shouldHaveRoute := expectedRoutes[nodeIDStr]; shouldHaveRoute {
assert.Contains(t, node.GetAvailableRoutes(), route)
}
}
func TestRouteAdvertisementBasic(t *testing.T) {
IntegrationSkip(t)
t.Parallel()
spec := ScenarioSpec{
NodesPerUser: 2,
Users: []string{"user1"},
}
scenario, err := NewScenario(spec)
require.NoError(t, err)
defer scenario.ShutdownAssertNoPanics(t)
err = scenario.CreateHeadscaleEnv([]tsic.Option{}, hsic.WithTestName("route"))
require.NoError(t, err)
allClients, err := scenario.ListTailscaleClients()
require.NoError(t, err)
headscale, err := scenario.Headscale()
require.NoError(t, err)
// --- Blocking: advertise the route on one client ---
router := allClients[0]
_, _, err = router.Execute([]string{
"tailscale", "set",
"--advertise-routes=10.33.0.0/16",
})
require.NoErrorf(t, err, "advertising route: %s", err)
// --- Eventually: headscale should see the announced route ---
var nodes []*v1.Node
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err = headscale.ListNodes()
assert.NoError(c, err)
assert.Len(c, nodes, 2)
for _, node := range nodes {
if node.GetName() == router.Hostname() {
requireNodeRouteCountWithCollect(c, node, 1, 0, 0)
}
}
}, 10*time.Second, 500*time.Millisecond, "route should be announced")
// --- Blocking: approve the route via headscale CLI ---
var routerNode *v1.Node
for _, node := range nodes {
if node.GetName() == router.Hostname() {
routerNode = node
break
}
}
require.NotNil(t, routerNode)
_, err = headscale.ApproveRoutes(routerNode.GetId(), []string{"10.33.0.0/16"})
require.NoError(t, err)
// --- Eventually: a peer should see the approved route ---
peer := allClients[1]
assert.EventuallyWithT(t, func(c *assert.CollectT) {
status, err := peer.Status()
assert.NoError(c, err)
for _, peerKey := range status.Peers() {
if peerKey == router.PublicKey() {
requirePeerSubnetRoutesWithCollect(c,
status.Peer[peerKey],
[]netip.Prefix{netip.MustParsePrefix("10.33.0.0/16")})
}
}
}, 10*time.Second, 500*time.Millisecond, "peer should see approved route")
}
IntegrationSkip(t): the test runs outside Docker and
fails in confusing ways. Always the first line.require inside EventuallyWithT: aborts after the first
iteration instead of retrying. Use assert.* + the *WithCollect
helpers.EventuallyWithT: hides real
failures. Keep mutation outside, query inside.err from client.Status(): retry only retries the
whole block; don't silently drop errors from mid-block calls.Tests save comprehensive artefacts to control_logs/{runID}/. Read them
in this order: server stderr, client stderr, MapResponse JSON, database
snapshot. The full debugging workflow, heuristics, and failure patterns
are documented in ../cmd/hi/README.md.