docs/guides/horizontal-pod-autoscaler/page.md
{% answer %}
GoFr exposes Prometheus metrics on METRICS_PORT (default 2121), which Kubernetes HPA v2 can read through prometheus-adapter. You can scale on CPU plus custom application signals, such as requests-per-second derived from GoFr's default HTTP histogram, by writing a discovery rule in the adapter and a HorizontalPodAutoscaler manifest that references it.
{% /answer %}
{% howto name="Autoscale a GoFr service with HPA" description="Configure Kubernetes Horizontal Pod Autoscaler against GoFr metrics — CPU first, custom request rate via prometheus-adapter when needed." steps=[{"name": "Enable metrics-server", "text": "kubectl apply the metrics-server manifest (or enable the minikube addon) so HPA can read pod CPU and memory."}, {"name": "Set resource requests", "text": "Set resources.requests.cpu on the GoFr Deployment — HPA computes utilization as a percentage of this."}, {"name": "Apply a CPU-based HPA", "text": "Apply autoscaling/v2 HPA with target averageUtilization on cpu (60% is a sane start); set min and max replicas based on baseline traffic."}, {"name": "Tune scale behavior", "text": "Set behavior.scaleUp.stabilizationWindowSeconds to absorb spikes and behavior.scaleDown to avoid flapping."}, {"name": "Optional: custom metrics", "text": "Install prometheus-adapter and define a rule that exposes app_http_response_count as a custom metric for HPA targeting."}, {"name": "Verify under load", "text": "Generate load with hey or k6; watch kubectl get hpa to confirm replicas grow and shrink as expected."}] /%}
Reach for HPA when traffic is bursty and a fixed replica count either over-provisions during quiet periods or under-serves during spikes. CPU autoscaling alone tends to lag behind I/O-bound workloads — a GoFr service waiting on a downstream HTTP call has low CPU but a long queue. Custom-metric HPA on QPS or latency closes that gap. For event-driven workloads (Kafka, NATS, MQTT) HPA cannot scale to zero; use KEDA for that.
GoFr publishes a {% new-tab-link newtab=true title="default set of HTTP, datasource, and runtime metrics" href="/docs/quick-start/observability" /%} on METRICS_PORT at /metrics. The HTTP server records app_http_response (a histogram), so requests-per-second can be derived as rate(app_http_response_count[1m]). You can also publish your own counters and histograms — see Publishing Custom Metrics.
Make sure your Pod template advertises the metrics port and a Prometheus scrape annotation (or a ServiceMonitor if you run prometheus-operator):
ports:
- name: http
containerPort: 8000
- name: metrics
containerPort: 2121
prometheus-adapter exposes Prometheus series as custom.metrics.k8s.io so HPA can query them. A minimal rule that surfaces per-pod RPS for a GoFr Deployment looks like:
rules:
- seriesQuery: 'app_http_response_count{namespace!="",pod!=""}'
resources:
overrides:
namespace: { resource: namespace }
pod: { resource: pod }
name:
matches: "^app_http_response_count$"
as: "http_requests_per_second"
metricsQuery: |
sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
Verify with kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/<ns>/pods/*/http_requests_per_second".
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: orders-api
namespace: prod
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: orders-api
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "50"
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100
periodSeconds: 30
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 60
The behavior block is the difference between an HPA that flaps and one that holds. Short scaleUp.stabilizationWindowSeconds reacts to bursts; long scaleDown.stabilizationWindowSeconds prevents thrashing when traffic drops momentarily.
minReadySeconds on the Deployment and a readinessProbe against /.well-known/health so HPA doesn't count not-ready pods toward capacity.usage / request. If the Deployment omits resources.requests.cpu, CPU-based scaling is silently disabled.minReplicas: 0 is rejected by the API server. If you need scale-to-zero for cron-like workloads, use KEDA.custom.metrics.k8s.io.kubectl get hpa orders-api -n prod
kubectl describe hpa orders-api -n prod
kubectl top pods -n prod -l app=orders-api
describe prints the Metrics block with current vs target values; mismatched units (e.g., m vs whole numbers) are the most common reason HPA reports unknown.
{% faq %}
{% faq-item question="Does GoFr need any code changes for HPA to work?" %}
No. GoFr already exposes Prometheus-format metrics on METRICS_PORT (default 2121). HPA configuration lives entirely in the adapter rule and the HPA manifest.
{% /faq-item %}
{% faq-item question="Can I scale a GoFr Pub/Sub subscriber with HPA?" %}
HPA can scale on CPU, but consumer-lag-based scaling is better handled by KEDA's Kafka or NATS scalers, which can also scale to zero between batches.
{% /faq-item %}
{% faq-item question="Why does my HPA show <unknown> for the custom metric?" %}
Either prometheus-adapter has not discovered the series yet, the metric name in the HPA manifest does not match the rule's as: value, or the labels (namespace, pod) are missing on the Prometheus series.
{% /faq-item %}
{% /faq %}