Dapr 1.16.2

This update includes bug fixes:

HTTP API default CORS behavior
Scheduler External etcd with multiple client endpoints
Placement not cleaning internal state after host that had actors disconnects
Blocked Placement dissemination during high churn
Blocked Placement dissemination with high Scheduler dataset
Fix panic during actor deactivation
OpenTelemetry environment variables support
Fixing goavro bug due to codec state mutation
APP_API_TOKEN not passed in gRPC metadata for app callbacks
Fixed Pulsar OAuth token renewal
Fix Scheduler connection during non-graceful network interruptions
Prevent infinite loop when workflow state is corrupted or destroyed

HTTP API default CORS behavior

Problem

In the 1.16.0 release a change was introduced that changed the default behavior of CORS in the Dapr HTTP API. Now by default CORS headers were added to all HTTP responses. However this new behavior couldn't be disabled.

Impact

This caused problems in scenarios where CORS is handled outside of the Dapr sidecar, because the Dapr Sidecar always added CORS headers.

Solution

Revert part of the behavior introduced in this PR and change the default value of allowed-origins flag to be an empty string, and disabling the CORS filter by default.

Scheduler External etcd with multiple client endpoints

Problem

Using Scheduler in non-embed mode with multiple etcd client endpoints was not working.

Impact

It was not possible to use multiple etcd endpoints for high availability with an external etcd database for scheduler.

Root Cause

The Scheduler etcd client endpoints CLI flag was typed as an string array, rather than a string slice, causing the given value to be parsed as a single string rather than a slice of strings.

Solution

Changed the type of the etcd client endpoints CLI flag to be a string slice.

Placement not cleaning internal state after host that had actors disconnects

Problem

An actor host that had actors doesn't get properly cleaned up from placement after the sidecar is scaled down and the placement stream is closed.

Impact

This results in the placement server iterating over namespaces that no longer exist for every tick of the disseminate ticker.

Root Cause

The function requiresUpdateInPlacementTables sould not set isActorHost to false once it is set to true, because once a host has actors the placement server keeps internal state for it and cleanup logic must be executed once the host disconnects.

Solution

Update the logic in requiresUpdateInPlacementTables.

Blocked Placement dissemination during high churn

Problem

Placement would fail to ever, or very slowly, disseminate the actor table in high daprd churn scenarios.

Impact

Actors or workflows would fail to be activated, and existing actors or workflows would fail.

Root Cause

Placement used a "small" (100) queue size which when exhausted would cause a deadlock. Placement would also wait for a fully consumed channel queue before disseminating slowing down the dissemination process.

Solution

Increase the queue size to 10000 and change the dissemination logic to not wait for a fully consumed queue before disseminating.

Blocked Placement dissemination with high Scheduler dataset

Problem

Disseminations would hang for long periods of time when the Scheduler dataset was large.

Impact

Dissemination could take up to hours to complete, causing reminders to not be delivered for a long period of time.

Root Cause

The reminder migration of state store to scheduler reminders does a full decoded scan of the Scheduler database, which would take a long time if there were many entries. During this time the dissemination would be blocked.

Solution

Limit the maximum time spent doing the migration to 3 seconds. Expose a new global.reminders.skipMigration="true" helm chart value which will skip the migration entirely.

Fix panic during actor deactivation

Problem

Daprd could panic during actor deactivation.

Impact

Daprd sidecar would crash, resulting in downtime for the application.

Root Cause

A race in the actor lock cached memory release and claiming logic meant a stale lock could be used during deactivation, double closing it, and causing a panic.

Solution

Tie the lock's lifecycle to the actor's lifecycle, ensuring the lock is only released when the actor is fully deactivated, and claimed with the actor itself.

OpenTelemetry environment variables support

Problem

OpenTelemetry OTEL_* environment variables were not fully respected, and dapr.io/env annotation parsing broke when values contained =.

Impact

OpenTelemetry resource attributes could not be reliably applied to the Dapr sidecar, degrading trace correlation with application containers, especially on Kubernetes. Configuring OTEL_RESOURCE_ATTRIBUTES via annotations did not work.

Root Cause

Resource creation used manual logic instead of the OpenTelemetry SDK’s environment-based resource detection.
The injector’s environment variable parsing treated = as a hard delimiter, breaking values that include =.

Solution

Adopt the OpenTelemetry SDK’s env-based resource detection so OTEL_* variables (including OTEL_RESOURCE_ATTRIBUTES) are honored.
Fix dapr.io/env parsing to allow values containing =.
Keep the Dapr app ID as the default service name when not overridden.

Fixing goavro bug due to codec state mutation

Problem

The goavro library had a bug where the codec state was mutated during decoding, causing the decoder to panic.

Impact

The goavro library would panic, causing the application to crash.

Root Cause

The goavro library did not correctly handle the codec state, causing it to panic when the codec state was mutated during decoding.

Solution

Update the goavro library to v2.14.1 to fix the bug. Take a more defensive approach, bringing back the old approach that always creates a new codec.

APP_API_TOKEN not passed in gRPC metadata for app callbacks

Problem

When APP_API_TOKEN was configured, the token was not being passed in gRPC metadata for app callbacks including:

PubSub subscriptions
Bindings
Jobs

This meant that applications using gRPC protocol could not authenticate incoming requests from Dapr when using the app API token security feature.

Impact

Applications that configured APP_API_TOKEN to secure their endpoints could not validate that incoming gRPC requests were from their Dapr sidecar. This broke the app API token authentication feature for gRPC applications.

Root Cause

The gRPC subscription delivery, binding, and job callback code paths were directly calling the app's gRPC client without going through the channel layer abstraction. The channel layer is responsible for injecting the APP_API_TOKEN in the dapr-api-token metadata header, but these direct calls bypassed this mechanism.

Solution

Centralized the APP_API_TOKEN injection logic in a helper function (AddAppTokenToContext) in the gRPC channel layer. Updated all gRPC app callback code paths (pubsub subscriptions, bindings, and job callbacks) to use this helper, ensuring the token is consistently added to the outgoing gRPC context metadata. Added comprehensive integration tests to verify token passing for all callback scenarios in both HTTP and gRPC protocols.

Fixed Pulsar OAuth token renewal

Problem

The pulsar pubsub component was not renewing the OAuth token when it expired.

Impact

Applications using the pulsar pubsub component could not receive/publish messages when the OAuth token expired.

Root Cause

There was a bug in the component code that was preventing the OAuth token from being renewed when it expired.

Solution

Fixed the bug in the component code ensuring the OAuth token is renewed when it expires. Also added a test to verify the token renewal functionality. Fixed in https://github.com/dapr/components-contrib/pull/4079

Fix Scheduler connection during non-graceful network interruptions

Problem

Catastrophic failure of scheduler connection during non-graceful network interruptions would not cause the dapr runtime to attempt to reconnect to Scheduler.

Impact

A true host network interruption (e.g. unplugging the network cable) would cause the dapr runtime to only recover connections to Scheduler after roughly 2 hours.

Root Cause

The gRPC KeepAlive parameters were not set correctly, causing the gRPC client to not detect broken connections in a timely manner.

Solution

The server and client KeepAlive parameters are now set to 3 second intervals with a 5 second timeout.

Prevent infinite loop when workflow state is corrupted or destroyed

Problem

Dapr workflows could enter an infinite reminder loop when the workflow state in the actor state store is corrupted or destroyed.

Impact

Dapr workflows would enter an infinite loop of reminder calls.

Root Cause

When a workflow reminder is triggered, the workflow state is loaded from the actor state store. If the state is corrupted or destroyed, the workflow would not be able to progress and would keep re-triggering the same reminder indefinitely.

Solution

Do not retry the reminder if the workflow state cannot be loaded, and instead log an error and exit the workflow execution.