docs/vpnhotspotd/lifecycle.md
vpnhotspotd is started lazily by
DaemonController
when the app sends the first daemon command. The daemon stays alive only while
the controller has active calls. When the last call is closed, Kotlin closes the
control connection; the Rust control loop then stops all daemon-owned runtime
state and exits.
Kotlin locates the native vpnhotspotd library in the APK and runs it through
Android's linker from a root command. It creates:
The Rust entry point accepts exactly one argument: that socket name. It connects back to the abstract Unix socket, splits the stream, starts a writer task for outbound frames, initializes the nonfatal reporter, and builds process-wide bookkeeping:
30000;The rtnetlink runtime is created on the first command that needs netlink or routing state: session start, neighbour monitoring, static-address replacement, or Clean. Commands that do not need netlink, such as traffic-counter reads, do not open the rtnetlink connection. Once created, the runtime remains process-wide until daemon exit.
The daemon does not listen for arbitrary clients. The app-side controller owns
the listening socket and accepts only a peer whose Unix socket credentials have
uid=0; non-root peers are closed and the controller keeps waiting within the
startup timeout. The daemon connects to that single controller.
The control loop decodes one client envelope at a time and dispatches each
non-cancel command into a task tracked by call ID. CancelCommand is handled
before dispatch and cancels the active call's cancellation token.
There are two call shapes:
StartSessionCommand and StartNeighbourMonitorCommand are event calls.
ReplaceSessionCommand, ReadTrafficCountersCommand,
ReplaceStaticAddressesCommand, and CleanRoutingCommand are one-shot calls.
StartSessionCommand sends an event ACK after the session is established, then
keeps the call active as the session owner. The session event stream may later
carry optional daemon-to-routing requests, such as an IPsec forwarding-policy
update request. StartNeighbourMonitorCommand sends an initial
neighbour/topology snapshot and then streams updates. The protobuf schema still
describes the frames; "event-style call" only describes the controller lifetime
shape.
Call IDs are part of lifecycle ownership. A session is stored under the call ID that started it. Closing that event call is the normal request to stop that session.
In these docs, a session means the daemon-owned runtime for one downstream
interface named by SessionConfig.downstream. It bundles that interface's DNS
proxy listeners, optional NAT66 proxy state, and routing mutations. It is not a
client connection and it is not an upstream network.
StartSessionCommand reserves a session slot before doing setup. The daemon
rejects a second active session for the same downstream interface. If an existing
session for that downstream has already been cancelled and is tearing down, the
new start waits for the old session to finish teardown and remove its daemon
slot, then retries insertion. If the new start is cancelled while waiting, it
exits without starting a session. If IPv6 NAT is requested, the process-wide
IPv6 NAT firewall base chains are attempted before the session runtime starts.
Failure there is reported as a structured nonfatal tied to the start call and
IPv6 NAT is disabled for that session start.
Session::start
constructs the session in this order:
Downstream IPv4 discovery is still required before a session can be established. After that point, DNS, NAT66, and routing setup failures remove only the affected MAC/protocol capability or mutation from the best-effort setup result.
After the session is installed, Rust publishes a session-control handle and sends an event ACK. Read, replace, and stop operations enqueue commands through that handle; the start-session task owns the session runtime and processes those commands in order. When cancelled normally, it removes the control handle from the slot, drains already queued commands, stops the session runtimes, removes the session from daemon state, and releases any same-downstream start waiting behind that teardown.
After the ACK, the daemon updates process-wide IPsec tracking for each active
session's upstream interface names and upstream generation. The tracked upstream
set is the union of primary and fallback upstream interfaces because either role
can be used by the installed routing policy for a given packet. On Android 12+,
if a session's upstream set or upstream generation changes, the daemon spawns a
best-effort global probe that runs /system/bin/dumpsys ipsec. Probe requests
are coalesced while one is already running; upstream churn does not queue a
trailing probe. The probe parses every matching IPv4 tunnel forwarding-policy
target in the dump, and the process-wide tracker emits only newly observed
targets whose interface is still in an active session's upstream set. Repeated
probes that observe the same target do not emit another request. No-match is
quiet; dumpsys or parser failures are structured global nonfatals. The daemon
clears its emitted-target record when the target disappears from a later probe or
its interface leaves all session upstream sets. The daemon does not separately
supervise a stuck dumpsys process, and it does not track or clean up IPsec
policy state; tunnel and policy teardown remain platform-owned.
ReplaceSessionCommand updates the config for an existing session. The
downstream interface is immutable; replacing it is rejected because routing and
session ownership are keyed to that interface. Replacement is ordered through
the session-control command loop, so it cannot interleave with traffic-counter
reads or session stop.
Inside that ordered replacement, the session holds its config mutex as a commit gate for DNS and NAT66 readers:
This means active DNS/NAT66 work that needs a config snapshot can pause behind replacement, but it cannot observe the next config before routing has committed the matching interception state.
Client changes are MAC-scoped. Replacement stages DNS and NAT66 resources for new MAC/protocol capabilities, reconciles routing, publishes only committed capabilities, and cancels removed or uncommitted per-MAC resources. Before a MAC or counter source is removed, the session exposes its final daemon-owned counters through the next traffic-counter read.
When the next client set is empty, replacement publishes no NAT66 routing capabilities. Existing NAT66 runtime state may stay alive only to preserve session-owned counters and deferred NAT66 eligibility for a later non-empty client set.
If process-wide firewall-base setup failed, NAT66 produced no runtime for a
non-empty client set, or routing committed no NAT66 TCP/UDP capability for a
non-empty client set, later replacements keep ipv6_nat disabled for that
session. An empty client set is not failure; replacement may start NAT66 when a
later neighbour snapshot adds a MAC.
After a successful replacement, the same process-wide IPsec state is updated from the session's new upstream interface union and upstream generation. A replacement triggers one IPsec probe when that interface union changes or when Kotlin reports that either upstream role has a new upstream snapshot, unless a global probe is already running. Client and downstream-only changes do not trigger a probe by themselves.
Normal session stop cancels the session stop token first so DNS and NAT66 listeners normally choose shutdown over reporting teardown-time socket errors. Shutdown does not wait for listener or per-packet tasks to drain before removing routing state. It then stops NAT66, which may withdraw router-advertised prefixes during stop.
When the control connection closes, the daemon cancels active calls, waits for call tasks, stops the neighbour monitor, stops all sessions without extra withdraw-cleanup, clears the IPsec aggregate, removes process-wide IPv6 NAT firewall base state, drops the writer, and exits.
CleanRoutingCommand is stronger than normal shutdown. It:
withdraw_cleanup = true;Clean must not depend on private app databases, preferences, or daemon memory. Anything that can outlive the process needs a deterministic cleanup path. Traffic history is not a cleanup input. Per-MAC listeners, redirect rules, TPROXY rules, and the single NAT66 ICMPv6 NFQUEUE rule are removed through normal routing cleanup or deterministic Clean reconstruction.
The daemon allows one neighbour monitor at a time. Starting a monitor registers single-consumer netlink neighbour and link event slots, sends an initial dump with bridge topology, then streams deltas until the event call is cancelled.
Stopping the monitor drops both netlink registrations and waits for the monitor task. Link events trigger bridge topology snapshots only when the topology actually changes.