docs/release_notes/v1.16.2.md
This update includes bug fixes:
In the 1.16.0 release a change was introduced that changed the default behavior of CORS in the Dapr HTTP API. Now by default CORS headers were added to all HTTP responses. However this new behavior couldn't be disabled.
This caused problems in scenarios where CORS is handled outside of the Dapr sidecar, because the Dapr Sidecar always added CORS headers.
Revert part of the behavior introduced in this PR and change the default value of allowed-origins flag to be an empty string, and disabling the CORS filter by default.
Using Scheduler in non-embed mode with multiple etcd client endpoints was not working.
It was not possible to use multiple etcd endpoints for high availability with an external etcd database for scheduler.
The Scheduler etcd client endpoints CLI flag was typed as an string array, rather than a string slice, causing the given value to be parsed as a single string rather than a slice of strings.
Changed the type of the etcd client endpoints CLI flag to be a string slice.
An actor host that had actors doesn't get properly cleaned up from placement after the sidecar is scaled down and the placement stream is closed.
This results in the placement server iterating over namespaces that no longer exist for every tick of the disseminate ticker.
The function requiresUpdateInPlacementTables sould not set isActorHost to false once it is set to true, because once a host has actors the placement server keeps internal state for it and cleanup logic must be executed once the host disconnects.
Update the logic in requiresUpdateInPlacementTables.
Placement would fail to ever, or very slowly, disseminate the actor table in high daprd churn scenarios.
Actors or workflows would fail to be activated, and existing actors or workflows would fail.
Placement used a "small" (100) queue size which when exhausted would cause a deadlock. Placement would also wait for a fully consumed channel queue before disseminating slowing down the dissemination process.
Increase the queue size to 10000 and change the dissemination logic to not wait for a fully consumed queue before disseminating.
Disseminations would hang for long periods of time when the Scheduler dataset was large.
Dissemination could take up to hours to complete, causing reminders to not be delivered for a long period of time.
The reminder migration of state store to scheduler reminders does a full decoded scan of the Scheduler database, which would take a long time if there were many entries. During this time the dissemination would be blocked.
Limit the maximum time spent doing the migration to 3 seconds.
Expose a new global.reminders.skipMigration="true" helm chart value which will skip the migration entirely.
Daprd could panic during actor deactivation.
Daprd sidecar would crash, resulting in downtime for the application.
A race in the actor lock cached memory release and claiming logic meant a stale lock could be used during deactivation, double closing it, and causing a panic.
Tie the lock's lifecycle to the actor's lifecycle, ensuring the lock is only released when the actor is fully deactivated, and claimed with the actor itself.
OpenTelemetry OTEL_* environment variables were not fully respected, and dapr.io/env annotation parsing broke when values contained =.
OpenTelemetry resource attributes could not be reliably applied to the Dapr sidecar, degrading trace correlation with application containers, especially on Kubernetes. Configuring OTEL_RESOURCE_ATTRIBUTES via annotations did not work.
= as a hard delimiter, breaking values that include =.OTEL_* variables (including OTEL_RESOURCE_ATTRIBUTES) are honored.dapr.io/env parsing to allow values containing =.The goavro library had a bug where the codec state was mutated during decoding, causing the decoder to panic.
The goavro library would panic, causing the application to crash.
The goavro library did not correctly handle the codec state, causing it to panic when the codec state was mutated during decoding.
Update the goavro library to v2.14.1 to fix the bug. Take a more defensive approach, bringing back the old approach that always creates a new codec.
When APP_API_TOKEN was configured, the token was not being passed in gRPC metadata for app callbacks including:
This meant that applications using gRPC protocol could not authenticate incoming requests from Dapr when using the app API token security feature.
Applications that configured APP_API_TOKEN to secure their endpoints could not validate that incoming gRPC requests were from their Dapr sidecar. This broke the app API token authentication feature for gRPC applications.
The gRPC subscription delivery, binding, and job callback code paths were directly calling the app's gRPC client without going through the channel layer abstraction. The channel layer is responsible for injecting the APP_API_TOKEN in the dapr-api-token metadata header, but these direct calls bypassed this mechanism.
Centralized the APP_API_TOKEN injection logic in a helper function (AddAppTokenToContext) in the gRPC channel layer. Updated all gRPC app callback code paths (pubsub subscriptions, bindings, and job callbacks) to use this helper, ensuring the token is consistently added to the outgoing gRPC context metadata. Added comprehensive integration tests to verify token passing for all callback scenarios in both HTTP and gRPC protocols.
The pulsar pubsub component was not renewing the OAuth token when it expired.
Applications using the pulsar pubsub component could not receive/publish messages when the OAuth token expired.
There was a bug in the component code that was preventing the OAuth token from being renewed when it expired.
Fixed the bug in the component code ensuring the OAuth token is renewed when it expires. Also added a test to verify the token renewal functionality. Fixed in https://github.com/dapr/components-contrib/pull/4079
Catastrophic failure of scheduler connection during non-graceful network interruptions would not cause the dapr runtime to attempt to reconnect to Scheduler.
A true host network interruption (e.g. unplugging the network cable) would cause the dapr runtime to only recover connections to Scheduler after roughly 2 hours.
The gRPC KeepAlive parameters were not set correctly, causing the gRPC client to not detect broken connections in a timely manner.
The server and client KeepAlive parameters are now set to 3 second intervals with a 5 second timeout.
Dapr workflows could enter an infinite reminder loop when the workflow state in the actor state store is corrupted or destroyed.
Dapr workflows would enter an infinite loop of reminder calls.
When a workflow reminder is triggered, the workflow state is loaded from the actor state store. If the state is corrupted or destroyed, the workflow would not be able to progress and would keep re-triggering the same reminder indefinitely.
Do not retry the reminder if the workflow state cannot be loaded, and instead log an error and exit the workflow execution.