docs/self-hosting/reference-architectures/on-prem-k8s-ha.mdx
Deploying Infisical on-premise with high availability requires expertise in networking, container orchestration, and database management. This guide serves as a reference architecture and a starting point. Actual deployments may vary depending on your organization's existing infrastructure and capabilities.
flowchart TB
subgraph GLB["Global LB (HAProxy/NGINX)"]
end
subgraph OS["Object Storage"]
direction LR
store["S3/MinIO/Enterprise Storage"]
subgraph store_contents["Storage Contents"]
wal["PostgreSQL WAL"]
pgbackup["PostgreSQL Backups"]
redisbackup["Redis Backups"]
end
end
subgraph DC1["Active Data Center"]
direction TB
subgraph k8s1["Kubernetes Cluster"]
ing1["Ingress Controller"]
app1["Infisical Deployment"]
subgraph db1["CloudNativePG"]
pg1p["PostgreSQL Primary"]
pg1r["PostgreSQL Replicas"]
end
subgraph red1["Redis (Bitnami)"]
rp1["Redis Primary"]
end
end
end
subgraph DC2["Passive Data Center"]
direction TB
subgraph k8s2["Kubernetes Cluster"]
ing2["Ingress Controller"]
app2["Infisical Deployment"]
subgraph db2["CloudNativePG"]
pg2["PostgreSQL Replicas"]
end
subgraph red2["Redis (Bitnami)"]
r2["Redis Standby"]
end
end
end
%% Connections
GLB --> ing1
GLB -.-> ing2
%% Database connections
pg1p --> store
store --> pg2
%% Redis backup flow
rp1 --> store
store -.-> r2
%% Intra-DC connections
ing1 --> app1
app1 --> db1
app1 --> red1
ing2 --> app2
app2 --> db2
app2 --> red2
classDef primary fill:#f96,stroke:#333
classDef replica fill:#69f,stroke:#333
classDef storage fill:#9c6,stroke:#333
classDef lb fill:#c9f,stroke:#333
class pg1p,rp1 primary
class pg1r,pg2,r2 replica
class store,wal,pgbackup,redisbackup storage
class GLB,ing1,ing2 lb
The architecture above makes use of Kubernetes for orchestrating both stateless and stateful components. The architecture spans multiple data centers for increased redundancy, availability and disaster recovery capabilities using an active-passive configuration.
While managing databases within Kubernetes has typically been complex, modern operators like CloudNativePG simplify this process by handling storage provisioning, persistent volume management, and backup/recovery processes. However, if you lack deep expertise in Kubernetes operators or database management, we recommend a hybrid approach where the database is on a managed service for production deployments.
<Warning> Managing stateful components like databases can be challenging without deep expertise or a dedicated in-house database management team. To simplify operations and reduce complexity, we recommend offloading databases to managed services from AWS/GCP. These managed services automatically handle provisioning, scaling, failover, backups and rollbacks. </Warning>Infisical is deployed on a Kubernetes cluster, which allows for container management, auto-scaling, and self-healing capabilities. A load balancer sits in front of the Kubernetes cluster, directing traffic and making sure there is an even load distribution across the application nodes. This is the entry point where all other services will interact with Infisical.
The architecture requires S3-compatible object storage for database backups and cross-datacenter replication. This can be provided by:
The object storage must be accessible from all Kubernetes clusters and provides:
The database layer is powered by PostgreSQL, managed by CloudNativePG operator for high availability:
Redis is deployed using the Bitnami Helm chart in a simple primary configuration:
PostgreSQL is the single source of truth for nearly all application data on Infisical.
CloudNativePG provides well defined backup and restore capabilities:
Each Redis instance is backed up through a Kubernetes CronJob that:
SAVE commanddump.rdb to object storageDuring failover, the latest Redis backup is restored from object storage to the passive data center. This process is manual and requires operator intervention.
Infisical can be deployed across multiple data centers in an active-passive configuration for disaster recovery. In this setup, one data center serves as the active site while others remain as passive standbys.
The active data center contains:
Passive data centers act as disaster recovery sites. Each contains:
Traffic routing between data centers requires:
The global load balancer should be deployed in a highly available configuration across multiple locations to avoid it becoming a single point of failure.
During normal operation:
During failover:
CloudNativePG manages replication across data centers:
If using MinIO for object storage, ensure: