Back to Gofr

Production Tracing for GoFr

docs/guides/production-tracing/page.md

1.56.510.1 KB
Original Source

Production Tracing for GoFr

{% answer %} GoFr ships built-in OpenTelemetry tracing — every HTTP request, gRPC call, and datasource operation is traced automatically. Configure the exporter via TRACE_EXPORTER (otlp, jaeger, zipkin, or gofr) and TRACER_URL, set TRACER_RATIO for head-based sampling, and W3C Trace Context propagation flows through GoFr's HTTP service client without extra code. {% /answer %}

{% howto name="Wire production tracing for a GoFr service" description="Configure OTLP gRPC tracing in GoFr, point it at Jaeger / Tempo / Honeycomb, and tune sampling for production." steps=[{"name": "Set TRACE_EXPORTER", "text": "Set TRACE_EXPORTER=otlp in configs/.env (or an env-based ConfigMap in K8s) — GoFr ships an OTLP gRPC exporter."}, {"name": "Set TRACER_URL", "text": "Set TRACER_URL to a bare host:port (no http:// scheme) on port 4317 for OTLP gRPC; route to Jaeger collector, Tempo, or any OTLP backend."}, {"name": "Tune TRACER_RATIO", "text": "Set TRACER_RATIO to 1.0 in dev for full sampling; in prod step down to 0.1 or lower based on volume."}, {"name": "Add custom spans", "text": "Use ctx.Trace(name) inside handlers to mark sub-operations; existing HTTP, gRPC, and datasource spans are emitted automatically."}, {"name": "Verify in the backend", "text": "Hit a route, then open the Jaeger UI / Grafana Tempo and search for the service by APP_NAME — confirm spans show up with trace_id."}, {"name": "Propagate across services", "text": "GoFr injects W3C TraceContext on outbound calls via ctx.GetHTTPService — so two GoFr services share a single trace ID end to end."}] /%}

When to use this guide

You have GoFr running in Kubernetes (or any container platform) and want traces flowing into a backend — Jaeger, Grafana Tempo, an OpenTelemetry Collector, or a vendor that accepts OTLP. This guide covers exporter configuration, sampling, and propagation across multiple services.

For adding application-level spans inside handlers, see {% new-tab-link newtab=false title="Custom Spans In Tracing" href="/docs/advanced-guide/custom-spans-in-tracing" /%}.

What GoFr traces automatically

Once tracing is enabled, GoFr instruments without code changes:

  • HTTP server — every incoming request becomes a root span (or a child if upstream sent W3C trace headers).
  • HTTP client — outgoing calls via the GoFr service client (with circuit breaker / retry / rate limit) are traced and propagate context.
  • gRPC — server and client interceptors emit spans.
  • Datasources — SQL, Redis, Mongo, Cassandra, Pub/Sub publishers and subscribers (Kafka, NATS, SQS, Google Pub/Sub) emit spans for each operation.
  • Migrations — recorded as spans, useful for debugging long-running schema changes.

What custom spans add (ctx.Trace("name")) is application logic — business operations that span multiple datasource calls or pure-CPU work you want to time.

Configuration

GoFr reads tracing config from environment variables. The relevant keys (verified against pkg/gofr/otel.go):

VariablePurposeDefault
TRACE_EXPORTEROne of otlp, jaeger, zipkin, gofrunset (tracing disabled)
TRACER_URLEndpoint for the chosen exporterunset
TRACER_HOSTDeprecated — use TRACER_URLunset
TRACER_PORTDeprecated — use TRACER_URL9411
TRACER_RATIOHead-based sampling ratio (0.0–1.0)1
TRACER_HEADERSCustom OTLP headers, Key1=Value1,Key2=Value2unset
TRACER_AUTH_KEYShortcut for Authorization headerunset

Tracing is disabled if neither TRACE_EXPORTER nor TRACER_URL is set — GoFr logs tracing is disabled, as configs are not provided at debug level. The sampler is ParentBased(TraceIDRatioBased(TRACER_RATIO)), so a sampling decision made upstream is honored.

zipkin is supported but deprecated; the framework logs a warning recommending otlp instead. The gofr exporter ships traces to GoFr's hosted tracer at https://tracer-api.gofr.dev/api/spans (override with TRACER_URL).

Backend recipes

Jaeger (OTLP gRPC)

Modern Jaeger (1.35+) accepts OTLP natively on port 4317:

yaml
# ConfigMap fragment
TRACE_EXPORTER: "jaeger"
TRACER_URL: "jaeger-collector.observability.svc.cluster.local:4317"
TRACER_RATIO: "0.1"

jaeger and otlp use the same OTLP gRPC exporter under the hood — they differ only in log labeling.

Grafana Tempo / OpenTelemetry Collector

Point at any OTLP gRPC endpoint:

yaml
TRACE_EXPORTER: "otlp"
TRACER_URL: "otel-collector.observability.svc.cluster.local:4317"
TRACER_RATIO: "0.1"

Running an OTel Collector as a sidecar or DaemonSet is the recommended pattern: it does tail-based sampling, batching, and can fan out to multiple backends without changing the app.

Honeycomb / Datadog / Vendor OTLP

For SaaS backends that accept OTLP and require an API key:

yaml
TRACE_EXPORTER: "otlp"
TRACER_URL: "api.honeycomb.io:443"
TRACER_HEADERS: "x-honeycomb-team=YOUR_API_KEY,x-honeycomb-dataset=orders"
TRACER_RATIO: "0.1"

Or with a single auth header:

yaml
TRACER_AUTH_KEY: "Bearer YOUR_TOKEN"

GoFr's OTLP exporter currently uses an insecure (cleartext) gRPC connection inside the cluster — for SaaS endpoints over the public internet, route through an OTel Collector that terminates TLS, or rely on a service mesh.

Sampling: head-based vs tail-based

TRACER_RATIO is head-based: the sampling decision is made when the trace starts. With TRACER_RATIO=0.1, 10% of root spans are kept; the other 90% are dropped at the source. Cheap, predictable, but you cannot retroactively keep a slow or errored trace that wasn't sampled.

For production-grade observability, tail-based sampling — done in an OpenTelemetry Collector with the tail_sampling processor — lets you keep all traces that contain errors or exceed a latency threshold while sub-sampling the happy path. The pattern is: app sends 100% (or a high ratio) to the local collector; collector decides what to ship onward.

A starting matrix:

EnvironmentTRACER_RATIONotes
Local dev1See everything
Staging1Catch issues before prod
Production (low traffic, < 50 RPS)1Volume is fine
Production (high traffic)0.050.1Or sample 100% to a collector and tail-sample there

Propagation across services

GoFr sets up a CompositeTextMapPropagator(TraceContext{}, Baggage{}), so the W3C traceparent and baggage headers are honored on incoming requests and written on outgoing requests through the GoFr HTTP service client. No extra code is needed:

go
package main

import (
	"encoding/json"

	"gofr.dev/pkg/gofr"
)

func main() {
	app := gofr.New()

	app.AddHTTPService("payments", "http://payments.default.svc.cluster.local")

	app.GET("/checkout", func(ctx *gofr.Context) (any, error) {
		span := ctx.Trace("checkout.compute-total")
		defer span.End()

		// The downstream span on payments will be a child of this trace.
		// GetWithHeaders takes (ctx, path, queryParams, headers) and returns (*http.Response, error).
		httpResp, err := ctx.GetHTTPService("payments").
			GetWithHeaders(ctx, "/charge", nil, nil)
		if err != nil {
			return nil, err
		}
		defer httpResp.Body.Close()

		var resp any
		if err := json.NewDecoder(httpResp.Body).Decode(&resp); err != nil {
			return nil, err
		}

		return resp, nil
	})

	app.Run()
}

The downstream payments service — also a GoFr app pointed at the same exporter — will record its spans as children of the same trace. In Jaeger or Tempo, you'll see the full chain end-to-end.

Production tips

  • One exporter, many services: point all your services at the same collector. Querying a trace that hops services is the whole point.
  • Resource attributes: GoFr sets service.name from APP_NAME (default gofr-app). Set APP_NAME per-deployment so traces are attributable.
  • Don't sample on the client when you can sample on the collector — once dropped at the source, a trace is gone forever.
  • Watch the exporter error log: GoFr installs a custom OTel error handler (otelErrorHandler) that logs exporter failures via the standard logger. If you see these in volume, your collector is unreachable or overwhelmed.
  • Trace IDs in logs: include the trace ID in your logs to jump from a noisy log line to its trace. GoFr's structured logger and trace context share *gofr.Context, so you can read span.SpanContext().TraceID() and log it.

Verification

bash
# 1. Confirm env is set inside the pod.
kubectl exec deploy/orders -- env | grep -E "TRACE_|TRACER_"

# 2. Generate traffic.
kubectl port-forward svc/orders 8080:80
for i in $(seq 1 50); do curl -s http://localhost:8080/checkout > /dev/null; done

# 3. Confirm spans are flowing in the collector or backend logs.
kubectl logs -n observability deploy/otel-collector | grep -i orders

# 4. Open Jaeger UI and search service=orders.
kubectl port-forward -n observability svc/jaeger-query 16686:16686
# http://localhost:16686

{% faq %} {% faq-item question="Tracing is configured but I see no spans in the backend." %} Check three things in order. First, GoFr logs Exporting traces to <name> at <url> on startup — if absent, the exporter never initialized; verify TRACE_EXPORTER is one of otlp, jaeger, zipkin, or gofr. Second, port-forward to the collector and confirm gRPC 4317 is reachable from the pod. Third, check TRACER_RATIO0 would silently drop everything. {% /faq-item %} {% faq-item question="Why are my downstream service's spans showing up as separate traces?" %} The downstream call must go through GoFr's HTTP service client (app.AddHTTPService + ctx.GetHTTPService). A raw http.Client will not inject the traceparent header. If you must use a custom client, wrap its transport with otelhttp.NewTransport. {% /faq-item %} {% faq-item question="Do I need a Collector, or can I send directly to Jaeger/vendor?" %} You can send directly — GoFr's OTLP exporter speaks OTLP gRPC to anything that accepts it. A Collector becomes worth it when you want tail-based sampling, batching across many services, or to swap backends without redeploying every service. {% /faq-item %} {% /faq %}