docs/release_notes/v1.16.1.md
This update includes bug fixes:
When running Dapr with an --app-port specified but no application listening on that port (either due to no server or delayed server startup), the actor runtime would initialize immediately before the app channel was ready. This created a race condition where actors were trying to communicate with an application that wasn't available yet, resulting in repeated error logs:
WARN[0064] Error processing operation DaprBuiltInActorNotFoundRetries. Retrying in 1s…
DEBU[0064] Error for operation DaprBuiltInActorNotFoundRetries was: failed to lookup actor: api error: code = FailedPrecondition desc = did not find address for actor
This created a poor user experience with confusing error messages when users specified an --app-port but had no application listening on that port.
The actor runtime initialization was occurring before the application channel was ready, creating a race condition where actors attempted to communicate with an unavailable application.
Defer actor runtime initialization until the application channel is ready. The runtime now:
waiting for application to listen on port XXXX messages instead of confusing error logsWhen running Dapr with an --app-port specified but no application listening on that port (either due to no server or delayed server startup), the actor runtime would initialize immediately before the app channel was ready. This created a race condition where actors were trying to communicate with an application that wasn't available yet, resulting in repeated error logs:
WARN[0064] Error processing operation DaprBuiltInActorNotFoundRetries. Retrying in 1s…
DEBU[0064] Error for operation DaprBuiltInActorNotFoundRetries was: failed to lookup actor: api error: code = FailedPrecondition desc = did not find address for actor
This created a poor user experience with confusing error messages when users specified an --app-port but had no application listening on that port.
The actor runtime initialization was occurring before the application channel was ready, creating a race condition where actors attempted to communicate with an unavailable application.
Defer actor runtime initialization until the application channel is ready. The runtime now:
waiting for application to listen on port XXXX messages instead of confusing error logsThe sidecar injector crashes with error (dapr-scheduler-server StatefulSet not found) when the scheduler is disabled via Helm chart (global.scheduler.enabled: false).
The crash prevents the sidecar injector from functioning correctly when the scheduler is disabled, disrupting deployments.
A previous change caused the dapr-scheduler-server StatefulSet to be removed when the scheduler was disabled, instead of scaling it to 0 as originally intended. The injector, hardcoded to check for the StatefulSet in the injector.go file, fails when it is not found.
Revert the behavior to scale the dapr-scheduler-server StatefulSet to 0 when the scheduler is disabled, instead of removing it, as implemented in the Helm chart.
Application Health checks transitioning from unhealthy to healthy were incorrectly configuring the scheduler clients to stop watching for actor reminder jobs.
The misconfiguration in the scheduler clients made workflows to stop executing because reminders no longer executed.
On Application Health change daprd was able to trigger an actors update for an empty slice, which caused a scheduler client reconfiguration. However because there were no changes in the actor types, daprd never received a new version of the placement table which caused the scheduler clients to get misconfigured. This happens because when daprd sends an actor types update to the placement server daprd wipes out the known actor types in the scheduler client, and because daprd never received an acknowledgement from placement with a new table version then the scheduler client never got updated back with the actor types.
Prevent any changes to hosted actor types if the input slice is empty
The Scheduler Etcd client port is not available when running in Dapr CLI standalone mode.
Cannot perform Scheduler Etcd admin operations in Dapr CLI standalone mode.
The Scheduler Etcd client port is only listened on localhost.
The Scheduler Etcd client listen address is now configurable via the --scheduler-etcd-client-listen-address CLI flag, meaning port can be exposed when running in standalone mode.
The Scheduler would always treat --etcd-embed as true, even when set to false in the context of the Helm chart.
Cannot use external etcd addresses since Scheduler would always assume embedded etcd is used.
The Helm template format treated the boolean argument as a seperate argument rather than inline.
The template format string was fixed to allow for .etcdEmbed to be set to false.
The Component init timeout was checked after using the component reporter
This misalignment could lead to false positives, dapr could have reported success when later dapr was returning an error due the timeout check
Move the timeout check to be right after the actual component initialization and before the component reporter
The pubsub.kafka component failed to publish Avro messages in Dapr 1.16, breaking existing workflows.
Avro messages could not be published correctly, causing failures in Kafka message pipelines and potential data loss or dead-lettering issues.
The Kafka pubsub component did not correctly create codecs in the SchemaRegistryClient. Additionally, the goavro library had a bug converting default null values that broke legitimate schemas.
Enabled codec creation in the Kafka SchemaRegistryClient and upgraded github.com/linkedin/goavro/v2 from v2.13.1 to v2.14.0 to fix null value handling. Metadata options useAvroJson and excludeHeaderMetaRegex were validated to ensure correct message encoding and dead-letter handling. Manual tests confirmed Avro and JSON message publication works as expected.
Some SFTP servers require files to be closed before they become available for reading. Without closing, read operations could fail or return incomplete data.
SFTP file reads could fail or return incomplete data on certain servers, causing downstream processing issues.
The SFTP component did not explicitly close files after writing, which some servers require to make files readable.
Updated the SFTP component to close files after writing, ensuring they are available for reading on all supported servers.
The AWS Secrets Manager component failed to correctly parse YAML metadata, causing boolean fields like multipleKeyValuesPerSecret to be misinterpreted.
Incorrect metadata parsing could lead to misconfiguration, preventing secrets from being retrieved or handled properly.
The component used a JSON marshal/unmarshal approach in getSecretManagerMetadata, which did not handle string-to-boolean conversion correctly for YAML metadata.
Replaced JSON marshal/unmarshal with kitmd.DecodeMetadata to correctly parse YAML metadata and convert string fields to their proper types, ensuring multipleKeyValuesPerSecret works as expected.
After migrating to the AWS v2 Kafka client, a new client was created for every message published, causing inefficiency and unnecessary resource usage.
Frequent client creation led to performance degradation, increased connection overhead, and potential resource exhaustion during high-throughput message publishing.
The AWS v2 client integration did not implement client reuse, resulting in a new client being instantiated for each publish operation.
Updated the Kafka component to reuse clients instead of creating a new one for each message, improving performance and resource efficiency.
The Kafka AWS authentication configuration was not initialized correctly, causing authentication failures.
Kafka components using AWS authentication could fail to connect, preventing message publishing and consumption.
A bug in the Kafka AWS auth config initialization prevented proper setup of authentication parameters.
Fixed the initialization logic in the Kafka AWS auth configuration to ensure proper authentication and connectivity.
Users experiencing issues with Placement server don't get enough information from the debug logs to troubleshoot or understand in what state the Placement server is
Inability to troubleshoot placement server.
Add more debug logs to get more detailed information about placement server dissemination logic.
Workflow workers connect to dapr but the workflow actors are never registered, resulting in workflows not executing and being unable to schedule new workflows.
Workflows API becoming unavailable.
When the durabletask-go library executes the "on GetWorkItems connection callback" if this callback fails to actually register the actors and returns an error, then the "on GetWorkItems disconnect callback" was not being invoked. This resulted in sidecar not trying to register the actors ever again, because the workflow engine kept a counter that was incremented by 1 but never got decreased.
Refactor durabletask-go to guarantee that the "on disconnect" callback will always be invoked if the "on connection" callback has been invoked.