Dapr 1.16.1

This update includes bug fixes:

Actor Initialization Timing Fix
Sidecar Injector Crash with Disabled Scheduler
Workflow actors reminders stopped after Application Health check transition
Fix Scheduler Etcd client port networking in standalone mode
Component initialization timeout check before using reporter
Fix Regression in pubsub.kafka Avro Message Publication
Ensure Files are Closed Before Reading in SFTP Component
Fix AWS Secrets Manager YAML Metadata Parsing
Reuse Kafka Clients in AWS v2 Migration
Fix Kafka AWS Authentication Configuration Bug
Enhanced debug logs for placement server
Workflow actors never registered again after failed actors registration on GetWorkItems connection callback

Actor Initialization Timing Fix

Problem

When running Dapr with an --app-port specified but no application listening on that port (either due to no server or delayed server startup), the actor runtime would initialize immediately before the app channel was ready. This created a race condition where actors were trying to communicate with an application that wasn't available yet, resulting in repeated error logs:

WARN[0064] Error processing operation DaprBuiltInActorNotFoundRetries. Retrying in 1s…
DEBU[0064] Error for operation DaprBuiltInActorNotFoundRetries was: failed to lookup actor: api error: code = FailedPrecondition desc = did not find address for actor

Impact

This created a poor user experience with confusing error messages when users specified an --app-port but had no application listening on that port.

Root cause

The actor runtime initialization was occurring before the application channel was ready, creating a race condition where actors attempted to communicate with an unavailable application.

Solution

Defer actor runtime initialization until the application channel is ready. The runtime now:

Defers actor runtime initialization until the application is listening on the specified port
Provides informative waiting for application to listen on port XXXX messages instead of confusing error logs
Prevents actor lookup errors during startup

Actor Initialization Timing Fix

Problem

WARN[0064] Error processing operation DaprBuiltInActorNotFoundRetries. Retrying in 1s…
DEBU[0064] Error for operation DaprBuiltInActorNotFoundRetries was: failed to lookup actor: api error: code = FailedPrecondition desc = did not find address for actor

Impact

This created a poor user experience with confusing error messages when users specified an --app-port but had no application listening on that port.

Root cause

The actor runtime initialization was occurring before the application channel was ready, creating a race condition where actors attempted to communicate with an unavailable application.

Solution

Defer actor runtime initialization until the application channel is ready. The runtime now:

Defers actor runtime initialization until the application is listening on the specified port
Provides informative waiting for application to listen on port XXXX messages instead of confusing error logs
Prevents actor lookup errors during startup

Sidecar Injector Crash with Disabled Scheduler

Problem

The sidecar injector crashes with error (dapr-scheduler-server StatefulSet not found) when the scheduler is disabled via Helm chart (global.scheduler.enabled: false).

Impact

The crash prevents the sidecar injector from functioning correctly when the scheduler is disabled, disrupting deployments.

Root cause

A previous change caused the dapr-scheduler-server StatefulSet to be removed when the scheduler was disabled, instead of scaling it to 0 as originally intended. The injector, hardcoded to check for the StatefulSet in the injector.go file, fails when it is not found.

Solution

Revert the behavior to scale the dapr-scheduler-server StatefulSet to 0 when the scheduler is disabled, instead of removing it, as implemented in the Helm chart.

Workflow actors reminders stopped after Application Health check transition

Problem

Application Health checks transitioning from unhealthy to healthy were incorrectly configuring the scheduler clients to stop watching for actor reminder jobs.

Impact

The misconfiguration in the scheduler clients made workflows to stop executing because reminders no longer executed.

Root cause

On Application Health change daprd was able to trigger an actors update for an empty slice, which caused a scheduler client reconfiguration. However because there were no changes in the actor types, daprd never received a new version of the placement table which caused the scheduler clients to get misconfigured. This happens because when daprd sends an actor types update to the placement server daprd wipes out the known actor types in the scheduler client, and because daprd never received an acknowledgement from placement with a new table version then the scheduler client never got updated back with the actor types.

Solution

Prevent any changes to hosted actor types if the input slice is empty

Fix Scheduler Etcd client port networking in standalone mode

Problem

The Scheduler Etcd client port is not available when running in Dapr CLI standalone mode.

Impact

Cannot perform Scheduler Etcd admin operations in Dapr CLI standalone mode.

Root cause

The Scheduler Etcd client port is only listened on localhost.

Solution

The Scheduler Etcd client listen address is now configurable via the --scheduler-etcd-client-listen-address CLI flag, meaning port can be exposed when running in standalone mode.

Fix Helm chart not honoring --etcd-embed argument

Problem

The Scheduler would always treat --etcd-embed as true, even when set to false in the context of the Helm chart.

Impact

Cannot use external etcd addresses since Scheduler would always assume embedded etcd is used.

Root cause

The Helm template format treated the boolean argument as a seperate argument rather than inline.

Solution

The template format string was fixed to allow for .etcdEmbed to be set to false.

Component initialization timeout check before using reporter

Problem

The Component init timeout was checked after using the component reporter

Impact

This misalignment could lead to false positives, dapr could have reported success when later dapr was returning an error due the timeout check

Solution

Move the timeout check to be right after the actual component initialization and before the component reporter

Fix Regression in pubsub.kafka Avro Message Publication

Problem

The pubsub.kafka component failed to publish Avro messages in Dapr 1.16, breaking existing workflows.

Impact

Avro messages could not be published correctly, causing failures in Kafka message pipelines and potential data loss or dead-lettering issues.

Root cause

The Kafka pubsub component did not correctly create codecs in the SchemaRegistryClient. Additionally, the goavro library had a bug converting default null values that broke legitimate schemas.

Solution

Enabled codec creation in the Kafka SchemaRegistryClient and upgraded github.com/linkedin/goavro/v2 from v2.13.1 to v2.14.0 to fix null value handling. Metadata options useAvroJson and excludeHeaderMetaRegex were validated to ensure correct message encoding and dead-letter handling. Manual tests confirmed Avro and JSON message publication works as expected.

Ensure Files are Closed Before Reading in SFTP Component

Problem

Some SFTP servers require files to be closed before they become available for reading. Without closing, read operations could fail or return incomplete data.

Impact

SFTP file reads could fail or return incomplete data on certain servers, causing downstream processing issues.

Root cause

The SFTP component did not explicitly close files after writing, which some servers require to make files readable.

Solution

Updated the SFTP component to close files after writing, ensuring they are available for reading on all supported servers.

Fix AWS Secrets Manager YAML Metadata Parsing

Problem

The AWS Secrets Manager component failed to correctly parse YAML metadata, causing boolean fields like multipleKeyValuesPerSecret to be misinterpreted.

Impact

Incorrect metadata parsing could lead to misconfiguration, preventing secrets from being retrieved or handled properly.

Root cause

The component used a JSON marshal/unmarshal approach in getSecretManagerMetadata, which did not handle string-to-boolean conversion correctly for YAML metadata.

Solution

Replaced JSON marshal/unmarshal with kitmd.DecodeMetadata to correctly parse YAML metadata and convert string fields to their proper types, ensuring multipleKeyValuesPerSecret works as expected.

Reuse Kafka Clients in AWS v2 Migration

Problem

After migrating to the AWS v2 Kafka client, a new client was created for every message published, causing inefficiency and unnecessary resource usage.

Impact

Frequent client creation led to performance degradation, increased connection overhead, and potential resource exhaustion during high-throughput message publishing.

Root cause

The AWS v2 client integration did not implement client reuse, resulting in a new client being instantiated for each publish operation.

Solution

Updated the Kafka component to reuse clients instead of creating a new one for each message, improving performance and resource efficiency.

Fix Kafka AWS Authentication Configuration Bug

Problem

The Kafka AWS authentication configuration was not initialized correctly, causing authentication failures.

Impact

Kafka components using AWS authentication could fail to connect, preventing message publishing and consumption.

Root cause

A bug in the Kafka AWS auth config initialization prevented proper setup of authentication parameters.

Solution

Fixed the initialization logic in the Kafka AWS auth configuration to ensure proper authentication and connectivity.

Enhanced debug logs for placement server

Problem

Users experiencing issues with Placement server don't get enough information from the debug logs to troubleshoot or understand in what state the Placement server is

Impact

Inability to troubleshoot placement server.

Solution

Add more debug logs to get more detailed information about placement server dissemination logic.

Workflow actors never registered again after failed actors registration on GetWorkItems connection callback

Problem

Workflow workers connect to dapr but the workflow actors are never registered, resulting in workflows not executing and being unable to schedule new workflows.

Impact

Workflows API becoming unavailable.

Root cause

When the durabletask-go library executes the "on GetWorkItems connection callback" if this callback fails to actually register the actors and returns an error, then the "on GetWorkItems disconnect callback" was not being invoked. This resulted in sidecar not trying to register the actors ever again, because the workflow engine kept a counter that was incremented by 1 but never got decreased.

Solution

Refactor durabletask-go to guarantee that the "on disconnect" callback will always be invoked if the "on connection" callback has been invoked.