docs/release_notes/v1.17.2.md
This update includes security fixes, a breaking change, a new component, and bug fixes:
Three vulnerabilities were identified in the Go standard library used by Dapr 1.17.1 (Go 1.24.13):
html/template, allowing potential cross-site scripting via crafted URLs.FileInfo can escape from a Root in os, potentially allowing access to files outside an intended directory boundary.net/url, which could lead to unexpected URL routing or SSRF in applications that parse user-supplied URLs.Applications using html/template, os.Root-scoped file operations, or net/url URL parsing are potentially affected by these vulnerabilities. All three are fixed in Go 1.25.8.
The vulnerabilities are in the Go standard library and are not specific to Dapr code. They affect any Go program compiled with Go versions prior to 1.25.8.
Upgraded the Go toolchain from 1.24.13 to 1.25.8 across all modules and Docker images in the repository.
The RavenDB state store component from components-contrib was not registered in the Dapr runtime, so it could not be used as a state store in Dapr applications.
Users could not use RavenDB as a state store backend with Dapr, despite the component implementation being available in components-contrib.
The component registration file for the RavenDB state store was missing from the Dapr runtime's component loader (cmd/daprd/components/).
Added the state_ravendb.go registration file to register the RavenDB state store component with the default state store registry. The component is available when building with the allcomponents build tag. The ravendb-go-client dependency was added to go.mod.
The Configuration CRD defined the stateRetentionPolicy fields (anyTerminal, completed, failed, terminated) as type: integer, format: int64, but the Go API types use metav1.Duration which serializes as strings (e.g. "1s", "168h").
This mismatch caused Kubernetes to reject valid duration string values for these fields, and prevented the workflow state retention policy from being configured correctly via the Kubernetes Configuration CRD.
Users running Dapr in Kubernetes mode could not configure the workflow state retention policy using the Configuration CRD with human-readable duration strings like "1s" or "168h". Kubernetes validation rejected these values because the CRD schema expected integers.
Additionally, even if integer nanosecond values were used to bypass the CRD schema validation, the internal configuration deserializer could not correctly unmarshal the metav1.Duration string format sent by the operator, causing daprd to fail with:
Fatal error from runtime: error loading configuration: json: cannot unmarshal string into Go struct field WorkflowStateRetentionPolicy.spec.workflow.stateRetentionPolicy.anyTerminal of type time.Duration
The Configuration CRD YAML (charts/dapr/crds/configuration.yaml) was not regenerated after the Go API type WorkflowStateRetentionPolicy was updated to use *metav1.Duration fields.
Updated the CRD schema to use type: string for all stateRetentionPolicy fields, matching the metav1.Duration serialization format.
Added a custom UnmarshalJSON method on the internal config.WorkflowStateRetentionPolicy struct that deserializes via the configapi.WorkflowStateRetentionPolicy type (which uses *metav1.Duration), correctly handling both the Kubernetes CRD string format and the standalone YAML format.
This is a change that requires a CRD update. Kubernetes does not automatically update CRDs when upgrading Dapr via Helm. You must manually update the CRDs before upgrading. See the Kubernetes upgrade guide for detailed instructions on how to force update CRDs.
To update CRDs manually:
kubectl apply -f https://raw.githubusercontent.com/dapr/dapr/v1.17.2/charts/dapr/crds/configuration.yaml
When the scheduler cluster membership changed (including during initial startup), one-shot jobs or jobs with a Drop failure policy could be triggered more than once.
Jobs configured with DueTime (one-shot) or a Drop failure policy could be delivered to the application multiple times instead of at most once.
This was more likely to occur during scheduler startup or when the scheduler cluster membership changed, as etcd can emit multiple membership events in quick succession.
A race condition existed between two asynchronous event loops in daprd's scheduler connection management. The hosts loop manages gRPC client connections to the scheduler, and the connector loop manages the stream-based cluster that runs on those connections.
When the hosts loop received a second set of scheduler host addresses (e.g. from an etcd membership event during startup), it immediately closed the first set of gRPC client connections before the connector loop had a chance to gracefully stop the cluster running on those connections. This caused active streams to break mid-flight, in-flight job triggers to be marked as undeliverable and re-staged, and jobs to fire again when new streams connected.
Moved gRPC connection lifecycle management from the hosts loop to the connector loop.
The hosts loop now passes connection close functions to the connector via the Connect event, and the connector closes old connections only after it has gracefully stopped the previous cluster.
This ensures connections are never closed while streams are still active.
The Dapr Scheduler service fails to start in Kubernetes with a fatal error:
Fatal error running scheduler: failed to create etcd config: peer certificate does not contain the expected DNS name dapr-scheduler-server-1.dapr-scheduler-server.dapr-system.svc.cluster.local. got [dapr-scheduler-server-0.dapr-scheduler-server.dapr-system.svc.cluster.local dapr-scheduler-server-1.dapr-scheduler-server.dapr-system.svc.cluster.local dapr-scheduler-server-2.dapr-scheduler-server.dapr-system.svc.cluster.local]
The Scheduler service cannot start in any Kubernetes cluster where the DNS CNAME lookup for the cluster domain returns a fully-qualified domain name with a trailing dot (standard DNS behavior). This prevents all scheduler-based functionality including job scheduling.
The scheduler resolves the Kubernetes cluster domain via a DNS CNAME lookup. Per DNS convention, CNAME responses include a trailing dot (e.g. cluster.local.).
The code only stripped leading dots from the result, leaving the trailing dot intact.
This caused the etcd peer TLS server name to end with an extra dot, which did not match the certificate SANs and failed validation.
Changed strings.TrimLeft to strings.Trim to strip dots from both ends of the parsed cluster domain, ensuring the trailing dot from DNS CNAME responses is removed.
When sending a request with a streaming body (chunked transfer encoding) through Dapr HTTP service invocation, the sidecar buffered the entire request body in memory before forwarding it. For large payloads—such as file uploads or long-running data streams—this caused excessive memory usage and potential out-of-memory crashes.
Any HTTP service invocation request without a known Content-Length (e.g. chunked uploads, streamed data, piped bodies) had its entire body buffered in memory by the sending sidecar.
This made Dapr unsuitable for streaming large payloads between services and could cause sidecar OOM kills in production.
The sidecar's retry mechanism unconditionally buffered the request body into memory so it could replay the body on retry. For streaming requests, the body cannot be replayed because it is consumed as it is read, making the buffering both unnecessary and harmful.
The sidecar now detects streaming requests (those with no known content length) and skips request body buffering entirely.
Both the built-in retry logic and any user-configured resiliency retry policies are automatically bypassed for streaming requests, since retrying would require re-reading a body that has already been consumed.
Non-streaming requests with a known Content-Length continue to support retries as before.
When proxying HTTP responses through service invocation, the sidecar buffered the entire response body in memory before forwarding it to the caller. For large or unbounded streaming responses, this caused excessive memory usage and potential out-of-memory crashes.
Any service invocation response with a large or streaming body could cause sidecar OOM kills, regardless of HTTP status code. This made Dapr unsuitable for proxying streaming responses such as server-sent events, file downloads, or long-running data streams between services.
The sidecar's resiliency mechanism read the full response body into memory so it could evaluate whether to retry the request. When the request itself is a stream that has already been consumed, retries are impossible regardless of the response, making the buffering unnecessary.
For streaming requests, the sidecar now forwards response bodies directly to the caller without buffering them in memory. Resiliency features like circuit breakers continue to track failures normally. Non-streaming requests continue to support retries and buffered error handling as before.
When using the Oracle Database state store component, a BulkGet request that encountered an error for one or more keys returned an HTTP 500 error for the entire request instead of returning per-key errors alongside successful results.
Applications using BulkGet with the Oracle Database state store could not retrieve any results if even a single key encountered an error. Instead of receiving successful results for valid keys with per-key errors for failed keys, the entire operation failed with an HTTP 500 response.
The BulkGet implementation in the Oracle Database state store component returned a top-level error when any individual key retrieval failed, rather than collecting the error and associating it with the specific key in the response.
Updated the BulkGet implementation to return per-key errors in the BulkGetResponse items instead of returning a top-level error. Successful key retrievals are now returned alongside any per-key errors, matching the expected state store BulkGet contract.
When the Pulsar pub/sub component was configured with an Avro schema, JSON messages were published without being validated against the schema. Invalid messages that did not conform to the Avro schema were accepted and published to the topic.
Applications relying on Avro schema enforcement at the Pulsar pub/sub layer could publish malformed messages that did not conform to the expected schema. Downstream consumers expecting schema-compliant messages could encounter deserialization failures or data integrity issues.
The Pulsar pub/sub component did not validate JSON message payloads against the configured Avro schema before publishing. The schema was used only for consumer-side deserialization, not for producer-side validation.
Added JSON-to-Avro schema validation in the publish path. Before publishing, the component now validates JSON message payloads against the configured Avro schema and returns an error if the message does not conform, preventing invalid messages from being published to the topic.
After upgrading to Dapr 1.17.x, deployments with many replicas (e.g. 50+) experience frequent "dissemination timeout after 8s" errors, and /placement/state showing only a fraction of expected hosts.
Actor invocations fail intermittently because most sidecars never receive a complete placement table. Rolling restarts and scaling events amplify the problem, making large actor deployments unstable.
Three issues combined to cause a cascading failure during dissemination:
currentVersion > version always evaluated to false. Stale UNLOCK messages were incorrectly applied.The conversation component using the LangChain Go Kit could panic with a nil pointer dereference when the LLM logger was invoked.
Applications using the conversation API with the LangChain Go Kit-based component could experience unexpected crashes due to a nil pointer dereference, causing the Dapr sidecar to restart.
The LLM logger callback in the LangChain Go Kit conversation component was called with a nil pointer, and the logger did not perform a nil check before accessing the pointer.
Added a nil pointer check in the LangChain Go Kit LLM logger to prevent the dereference, ensuring the conversation component handles the case gracefully without panicking.
Workflow activities that return results larger than ~2MB fail with a ResourceExhausted gRPC error when scheduling the activity result reminder via the scheduler:
Error scheduling reminder job activity-result-XXXX due to: rpc error: code = ResourceExhausted desc = trying to send message larger than max (37950104 vs. 2097152)
Any workflow activity returning a result larger than the default gRPC send message size limit (~2MB) fails to deliver its result back to the parent orchestration. The orchestration hangs indefinitely waiting for the activity result, eventually timing out or stalling.
The scheduler gRPC client configured MaxCallRecvMsgSize to allow receiving large messages, but did not configure MaxCallSendMsgSize. This left the send-side limit at the gRPC default (~2MB). When an activity completes, its result is serialized into a reminder job request sent to the scheduler. If the activity result exceeds the default limit, the gRPC client rejects the outgoing message before it reaches the server.
Added MaxCallSendMsgSize to the scheduler gRPC client dial options, matching the existing MaxCallRecvMsgSize configuration.
When using the Bulk Publish API with a pub/sub component that has NamespaceScoped enabled, messages were published to the un-namespaced topic instead of the namespace-prefixed topic.
Applications using namespace-scoped pub/sub components with the Bulk Publish API experienced silent message loss. Bulk-published messages were routed to the wrong topic (e.g. the un-namespaced exchange), while subscribers were listening on the namespace-prefixed topic. The regular Publish API was not affected, so only bulk publish users encountered this issue.
The Publish method in publisher.go prepends the namespace to req.Topic when NamespaceScoped is true, but the BulkPublish method did not include this same namespace-prefixing step. This caused bulk-published messages to bypass the namespace scoping entirely.
Added the namespace prefix guard to BulkPublish in publisher.go, immediately after scope validation and before either the native BulkPublisher or defaultBulkPublisher fallback path is invoked. This ensures bulk-published messages are routed to the same namespace-prefixed topic as regular published messages.
When a workflow used WaitForSingleEvent with a timeout, a timer reminder was created in the scheduler. If the external event was raised before the timer fired, the timer reminder was never deleted and remained as an orphan in the scheduler until it eventually fired unnecessarily.
Additionally, when a workflow completed while timers were still pending (e.g. a CreateTimer that had not yet fired), those timer reminders were also left behind.
Workflows using WaitForSingleEvent with timeouts accumulated orphan timer reminders in the scheduler. These timers would eventually fire and trigger unnecessary workflow actor invocations that were silently ignored, wasting scheduler and actor resources.
For long-running workflows with many WaitForSingleEvent calls or long timeouts, the number of orphan reminders could grow significantly.
The durable task SDK completes the event task when an external event is received, but does not signal the Dapr runtime to delete the associated timer reminder. The runtime had no mechanism to detect that a timer was no longer needed because its associated event had already been received. Similarly, when a workflow completed, there was no cleanup of pending timer reminders that had not yet fired.
Added two timer cleanup mechanisms to the workflow orchestrator:
Mid-execution cleanup (deleteCancelledEventTimers): After each workflow execution step, the runtime scans the history for TimerCreated events associated with WaitForSingleEvent calls (identified by the Name field on TimerCreated). When a matching EventRaised event is found in the new events, the corresponding timer reminder is deleted from the scheduler. Event name matching is case-insensitive, and already-deleted timers (e.g. from a crash recovery) are handled gracefully by ignoring NotFound errors.
Completion cleanup (deleteAllReminders): When a workflow completes and has unfired timers (detected by comparing TimerCreated vs TimerFired event counts), all reminders for the workflow and its activities are bulk-deleted via DeleteByActorID. This handles timers without a Name field (e.g. CreateTimer) that cannot be matched to specific events.
The Ollama conversation component's metadata spec was missing the endpoint metadata field, which is required to configure the Ollama server URL.
Users configuring the Ollama conversation component could not discover the endpoint metadata field through the component spec. The field was functional in code but not declared in the component metadata spec, making it invisible to tooling and documentation that relies on the spec.
The endpoint metadata field was omitted from the Ollama conversation component's metadata.yaml spec file.
Added the endpoint metadata field to the Ollama conversation component spec (conversation/ollama/metadata.yaml).
The Dapr CLI dapr workflow list command failed when MongoDB was configured as the workflow actor state store.
Users using MongoDB as their workflow actor state store could not list workflow instances via the Dapr CLI. The list operation requires prefix-based key queries to enumerate workflow instances, which MongoDB did not support.
The MongoDB state store component did not implement the KeysLiker interface, which provides prefix-based key listing functionality. The Dapr CLI's workflow list operation depends on this interface to query workflow instance keys by prefix.
Implemented the KeysLiker interface on the MongoDB state store component, enabling the prefix-based key listing queries required by the Dapr CLI workflow list command.
When the LangChain Go Kit conversation component received a response from the LLM that included required tool calls, but those tool calls were not actually invoked, no error was returned to the caller.
Applications using the conversation API with the LangChain Go Kit component could silently receive incomplete responses when the LLM requested tool calls that were not executed. The caller had no indication that the response was missing expected tool call results.
The LangChain Go Kit conversation component did not check whether tool calls flagged as required by the LLM were actually invoked during the conversation turn.
Added error handling to return an error when the LLM response includes required tool calls that were not invoked, ensuring the caller is informed of the incomplete response.
Sentry fails to sign workload certificates with the error:
x509: requested SignatureAlgorithm does not match private key type
This occurs when the CSR signature algorithm does not match the issuer key type. For example, when a sidecar generates an Ed25519 CSR but the Sentry issuer key is ECDSA, or vice versa. This breaks version skew scenarios where the sidecar and control plane use different key types.
Sidecars cannot obtain workload certificates from Sentry during version skew upgrades where the sidecar and Sentry use different cryptographic key types. All mTLS-secured communication fails, preventing the sidecar from starting.
Sentry copied the SignatureAlgorithm from the incoming CSR onto the workload certificate template. When x509.CreateCertificate was called, Go's x509 library rejected the mismatch between the template's signature algorithm (from the CSR) and the issuer's private key type.
Removed the hardcoded SignatureAlgorithm from certificate templates and the SignRequest struct. Go's x509.CreateCertificate now infers the correct signature algorithm from the issuer's signing key, allowing Sentry to sign certificates regardless of the CSR's key type.