Back to Dapr

Dapr 1.16.13

docs/release_notes/v1.16.13.md

1.18.04.2 KB
Original Source

Dapr 1.16.13

This update includes a Go version bump, and bug fixes:

Go version updated to 1.25.9

Problem

Dapr 1.16.12 was built with Go 1.25.8. Go 1.25.9 includes security fixes to the crypto/x509, crypto/tls, archive/tar, and html/template packages.

Impact

Users running Dapr built with Go 1.25.8 may be exposed to known vulnerabilities that have been patched in Go 1.25.9.

Solution

Updated the Go version from 1.25.8 to 1.25.9.

Scheduled jobs stop firing after a scheduler pod restart

Problem

When a scheduler pod in a multi-node cluster restarted, all scheduled jobs could stop firing for an extended or indefinite period. The sidecar's metadata endpoint continued to report connected scheduler addresses, but no job triggers were delivered to the application.

Impact

Any deployment running the scheduler service with multiple replicas was affected. During routine operations that cause a scheduler pod to restart, applications stopped receiving scheduled job triggers until an unrelated cluster event happened to re-establish the connections.

Root Cause

The sidecar maintains a streaming connection to each scheduler pod for receiving job triggers. These connections were managed by a shared runner. When any single connection encountered an error (such as the replaced scheduler pod briefly accepting then closing the connection during startup), the runner cancelled all connections, including healthy ones to the other scheduler pods. No reconnection was attempted because the host-watching mechanism had no reason to emit a new event when cluster membership had not changed.

Solution

Each per-scheduler streaming connector now retries independently on failure with a half-second backoff. A transient failure on one scheduler connection no longer affects healthy connections to other scheduler pods.

Pulsar pub/sub ignores processMode from component metadata and lacks async backpressure

Problem

The Pulsar pub/sub component ignored the processMode parameter when set in component metadata (YAML). The parameter was only read from subscription request metadata, so users who configured processMode: async or processMode: sync in the component YAML were silently running in the default mode. Additionally, async mode spawned an unbounded number of goroutines per message with no concurrency limit.

Impact

Applications that configured processMode in the Pulsar component YAML were not running in the expected processing mode. Users who set processMode: sync thinking they had synchronous, ordered processing were actually running in async mode.

In async mode, every incoming message spawned a new goroutine with no upper bound. Under high message rates, this caused unbounded unacked messages (~30k observed in production), excessive memory usage, and potential OOM crashes. The maxConcurrentHandlers metadata field controlled a channel buffer size but did not limit actual concurrent goroutines.

Root Cause

The processMode field was missing from the pulsarMetadata struct, so it was never parsed from component metadata. It was only read from the per-subscription request metadata, which most users do not set.

In async mode, a shared err variable across goroutines caused a data race, and maxConcurrentHandlers set to 0 caused a deadlock instead of falling back to a default value.

Solution

The processMode parameter is now correctly read from component metadata, with per-subscription metadata able to override it. Invalid values are rejected at initialization time.

Async mode now enforces a concurrency limit that applies backpressure when all handler slots are full, preventing unbounded goroutine growth. Setting maxConcurrentHandlers to 0 falls back to the default (100) instead of deadlocking.

Additionally, a data race in async mode was fixed, and graceful shutdown now waits for in-flight handlers before returning.