Apps/FailoverApp/README.md
A DNS App for Technitium DNS Server that provides automated failover functionality based on continuous health monitoring of backend resources. The app monitors configured endpoints and dynamically returns DNS responses based on their availability status, enabling high-availability DNS architectures with zero manual intervention.
The Failover App extends Technitium DNS Server with advanced health monitoring and failover capabilities for A, AAAA, and CNAME records in primary and forwarder zones. It continuously monitors backend servers using configurable health checks (ICMP ping, TCP connection, HTTP/HTTPS requests) and automatically adjusts DNS responses when failures are detected.
Key capabilities include:
This app is essential for system administrators operating mission-critical DNS infrastructure requiring automated fault tolerance.
The Failover App operates exclusively through APP records. It does NOT modify or interact with native A, AAAA, or CNAME records.
Critical operational considerations:
Processing order implications:
Example of conflicting configuration:
example.com. 300 IN A 192.0.2.1 ← This will always be returned
example.com. APP failover (primary: 192.0.2.1, secondary: 192.0.2.2) ← This will never be used
Correct configuration approach:
app.example.com. APP failover (primary: 192.0.2.1, secondary: 192.0.2.2)
Configuration is managed through the dnsApp.config JSON file located in the app directory. The file defines health checks, notification mechanisms, and maintenance windows.
The configuration structure comprises four root-level arrays defining operational components.
| Property | Type | Default | Description |
|---|---|---|---|
healthChecks | Array | [] | Defines available health check profiles with monitoring parameters |
emailAlerts | Array | [] | Configures SMTP-based email notification profiles |
webHooks | Array | [] | Defines HTTP webhook endpoints for status change notifications |
underMaintenance | Array | [] | Specifies network ranges considered under maintenance (always returns FAILED status) |
Health checks define how backend resources are monitored. Each health check profile can be referenced by name in APP record configurations.
Common Properties (All Types):
| Property | Type | Default | Description |
|---|---|---|---|
name | String | "default" | Unique identifier for this health check profile |
type | String | Required | Health check method: ping, tcp, http, https |
interval | Integer | 60 | Time between health checks in seconds |
retries | Integer | 3 | Number of consecutive failures before marking unhealthy |
timeout | Integer | 10 | Maximum time in seconds to wait for response |
emailAlert | String | null | Name of email alert profile to use (or "default") |
webHook | String | null | Name of webhook profile to use (or "default") |
Type-Specific Properties:
| Property | Type | Applies To | Description |
|---|---|---|---|
port | Integer | tcp | TCP port number to test connectivity |
url | String/null | http, https | Full URL to request; if null, URL is derived from domain name |
Example Health Check Configurations:
{
"name": "ping",
"type": "ping",
"interval": 60,
"retries": 3,
"timeout": 10,
"emailAlert": "default",
"webHook": "default"
}
{
"name": "tcp443",
"type": "tcp",
"interval": 60,
"retries": 3,
"timeout": 10,
"port": 443,
"emailAlert": "default",
"webHook": "default"
}
{
"name": "https",
"type": "https",
"interval": 60,
"retries": 3,
"timeout": 10,
"url": null,
"emailAlert": "default",
"webHook": "default"
}
Email alerts send notifications when monitored endpoints transition between health states.
| Property | Type | Default | Description |
|---|---|---|---|
name | String | "default" | Unique identifier for this alert profile |
enabled | Boolean | false | Whether email alerts are active |
alertTo | Array of Strings | [] | Recipient email addresses |
smtpServer | String | Required | SMTP server hostname or IP address |
smtpPort | Integer | 465 | SMTP server port (25, 465, 587 typical) |
startTls | Boolean | false | Use STARTTLS upgrade on plaintext connection |
smtpOverTls | Boolean | true | Use implicit TLS from connection start |
username | String | Required | SMTP authentication username |
password | String | Required | SMTP authentication password |
mailFrom | String | Required | Sender email address |
mailFromName | String | "DNS Server Alert" | Sender display name |
Example:
{
"name": "default",
"enabled": true,
"alertTo": ["[email protected]", "[email protected]"],
"smtpServer": "smtp.gmail.com",
"smtpPort": 587,
"startTls": true,
"smtpOverTls": false,
"username": "[email protected]",
"password": "app-specific-password",
"mailFrom": "[email protected]",
"mailFromName": "DNS Failover System"
}
Webhooks send HTTP POST requests with JSON payloads when health status changes.
| Property | Type | Default | Description |
|---|---|---|---|
name | String | "default" | Unique identifier for this webhook profile |
enabled | Boolean | false | Whether webhook notifications are active |
urls | Array of Strings | [] | HTTP/HTTPS endpoints to receive POST notifications |
Webhook Payload Structure:
{
"timestamp": "2026-01-26T12:34:56Z",
"address": "192.0.2.1",
"domain": "app.example.com",
"type": "A",
"healthCheck": "https",
"previousStatus": "Healthy",
"currentStatus": "Failed",
"failureReason": "Connection timeout after 10000ms"
}
Example:
{
"name": "default",
"enabled": true,
"urls": [
"https://monitoring.example.com/webhooks/dns-failover",
"https://slack.com/api/webhooks/T00000000/B00000000/XXXXXXXXXXXX"
]
}
Networks in maintenance mode always return FAILED health status, effectively removing them from rotation without deleting configurations.
| Property | Type | Description |
|---|---|---|
network | String | Network address in CIDR notation (e.g., 192.168.1.0/24 or 192.168.1.1/32) |
enabled | Boolean | Whether this maintenance rule is active |
Example:
{
"network": "192.168.10.2/32",
"enabled": true
}
APP records are created in DNS zones and reference health check profiles. The app supports two record types:
Returns IP addresses based on health status with primary/secondary failover logic.
JSON Structure:
{
"primary": {
"addresses": ["192.0.2.1", "192.0.2.2"]
},
"secondary": {
"addresses": ["198.51.100.1", "198.51.100.2"]
},
"healthCheck": "https",
"healthCheckUrl": "https://app.example.com/health"
}
Properties:
| Property | Type | Description |
|---|---|---|
primary.addresses | Array of Strings | Primary IP addresses (IPv4/IPv6) to return when healthy |
secondary.addresses | Array of Strings | Fallback IP addresses returned when all primary addresses fail |
healthCheck | String | Name of health check profile to use |
healthCheckUrl | String (optional) | Override URL for HTTP/HTTPS checks; if omitted, defaults to https://<queried-domain> |
Returns domain names based on health status.
JSON Structure:
{
"primary": {
"domain": "server1.example.com"
},
"secondary": {
"domain": "server2.example.com"
},
"healthCheck": "tcp443",
"healthCheckUrl": "https://server1.example.com/status"
}
Properties:
| Property | Type | Description |
|---|---|---|
primary.domain | String | Primary domain name to return when healthy |
secondary.domain | String | Fallback domain name when primary fails |
healthCheck | String | Name of health check profile to use |
healthCheckUrl | String (optional) | Override URL for health validation |
Special Behavior for Zone Apex:
When the queried name equals the zone apex, the app returns ANAME records instead of CNAME (which is prohibited at zone apex per RFC specifications).
Complete dnsApp.config demonstrating all features:
{
"healthChecks": [
{
"name": "ping",
"type": "ping",
"interval": 30,
"retries": 3,
"timeout": 5,
"emailAlert": "critical",
"webHook": "slack"
},
{
"name": "web-service",
"type": "https",
"interval": 60,
"retries": 2,
"timeout": 15,
"url": null,
"emailAlert": "critical",
"webHook": "slack"
},
{
"name": "database",
"type": "tcp",
"interval": 45,
"retries": 3,
"timeout": 10,
"port": 5432,
"emailAlert": "ops",
"webHook": "pagerduty"
}
],
"emailAlerts": [
{
"name": "critical",
"enabled": true,
"alertTo": ["[email protected]"],
"smtpServer": "smtp.gmail.com",
"smtpPort": 587,
"startTls": true,
"smtpOverTls": false,
"username": "[email protected]",
"password": "secure-app-password",
"mailFrom": "[email protected]",
"mailFromName": "DNS Failover Monitor"
},
{
"name": "ops",
"enabled": true,
"alertTo": ["[email protected]"],
"smtpServer": "smtp.office365.com",
"smtpPort": 587,
"startTls": true,
"smtpOverTls": false,
"username": "[email protected]",
"password": "another-password",
"mailFrom": "[email protected]",
"mailFromName": "Infrastructure Monitor"
}
],
"webHooks": [
{
"name": "slack",
"enabled": true,
"urls": ["https://hooks.slack.com/services/T00/B00/XXXX"]
},
{
"name": "pagerduty",
"enabled": true,
"urls": ["https://events.pagerduty.com/v2/enqueue"]
}
],
"underMaintenance": [
{
"network": "192.168.99.0/24",
"enabled": false
}
]
}
Corresponding APP Record (Address Type):
{
"primary": {
"addresses": [
"203.0.113.10",
"203.0.113.11"
]
},
"secondary": {
"addresses": [
"198.51.100.50",
"198.51.100.51"
]
},
"healthCheck": "web-service"
}
The app implements a continuous monitoring and evaluation pipeline:
Health Check Initialization: On startup, the app parses dnsApp.config and initializes health check timers based on configured intervals.
Periodic Monitoring: Each health check executes on its configured interval:
Status Evaluation: Health check results are classified:
retries consecutive timesDNS Query Processing: When a query arrives for an APP record:
State Change Notification: When status transitions occur:
Cache Expiration Management: Health monitors expire after 1 hour of inactivity (no queries) to conserve resources.
Primary datacenter in US-East, secondary in EU-West. If primary becomes unreachable, DNS automatically directs traffic to secondary.
{
"primary": { "addresses": ["203.0.113.10"] },
"secondary": { "addresses": ["198.51.100.20"] },
"healthCheck": "https",
"healthCheckUrl": "https://app.example.com/health"
}
Point www.example.com to a CDN provider; failover to origin server if CDN health check fails.
{
"primary": { "domain": "example.cdn.com" },
"secondary": { "domain": "origin.example.com" },
"healthCheck": "https"
}
Monitor multiple read replicas; remove failed instances from rotation automatically.
{
"primary": {
"addresses": [
"10.0.1.10",
"10.0.1.11",
"10.0.1.12"
]
},
"secondary": {
"addresses": ["10.0.2.100"]
},
"healthCheck": "database"
}
Temporarily remove a server from DNS rotation without deleting its configuration.
{
"network": "203.0.113.10/32",
"enabled": true
}
Monitor active server; automatically failover to passive and alert operations team.
{
"primary": { "addresses": ["192.0.2.10"] },
"secondary": { "addresses": ["192.0.2.20"] },
"healthCheck": "tcp443"
}
Health check configuration:
{
"name": "tcp443",
"type": "tcp",
"port": 443,
"interval": 30,
"retries": 2,
"timeout": 5,
"emailAlert": "critical",
"webHook": "slack"
}
Use different health checks for different service layers (ping for network, TCP for service, HTTPS for application).
Symptoms: DNS queries return addresses with 10-second TTL; logs show no health check activity.
Diagnostic Steps:
Verify health check configuration syntax in dnsApp.config
Check DNS server logs for initialization errors:
Apps → View Logs
Ensure health check name in APP record matches exactly (case-sensitive)
Verify interval is not set to an excessively large value
Resolution:
Symptoms: Status changes occur but no emails received.
Diagnostic Steps:
Check enabled: true in email alert configuration
Verify SMTP credentials and server connectivity:
telnet smtp.example.com 587
Review DNS server logs for SMTP errors
Test SMTP settings using external tool (e.g., swaks)
Verify firewall permits outbound connections on SMTP port
Resolution:
startTls and smtpOverTls settings match server requirementsSymptoms: Primary server is down but DNS still returns primary addresses.
Diagnostic Steps:
Query TXT record for status visibility:
dig @dns-server example.com TXT
Check health check retries value—must exceed consecutive failures
Verify health check type matches service (e.g., don't use ping if ICMP is blocked)
Confirm timeout is sufficient for network latency
Check if address is in underMaintenance with enabled: true
Resolution:
retries for faster failover detectiontimeout for high-latency environmentsSymptoms: Status changes logged but webhook endpoint receives no data.
Diagnostic Steps:
Verify webhook URL is reachable from DNS server:
curl -X POST https://webhook.example.com/endpoint -d '{"test":"data"}'
Check DNS server logs for HTTP errors
Confirm webhook endpoint accepts Content-Type: application/json
Test with webhook inspection service (e.g., webhook.site)
Resolution:
Symptoms: Failover occurs in logs but clients continue using failed servers.
Cause: Clients and recursive resolvers cache responses per original TTL.
Resolution:
Symptoms: App shows as installed but records return no data.
Diagnostic Steps:
Check DNS server logs for app initialization errors
Verify dnsApp.config is valid JSON:
cat dnsApp.config | jq .
Reload the app through the web console by saving its config again.
Resolution: