Back to Dapr

Dapr 1.5.1

docs/release_notes/v1.5.1.md

1.17.67.7 KB
Original Source

Dapr 1.5.1

Summary

This release addresses several issues with the following fixes:

Runtime

CLI

Upgrading

Important: If upgrading to this version using Helm instead of the Dapr CLI, you will need to update the Subscription CRD prior to performing the Helm upgrade.

cli
kubectl replace -f https://raw.githubusercontent.com/dapr/dapr/v1.4.4/charts/dapr/crds/subscription.yaml

Details

Retry logic for TooManyRequests initialization error for CosmosDb State and Binding components & New production guidance

Overview

  • New strongly advised production guidance for Azure Cosmos DB components have been established.
  • Dapr now retries connecting to Azure Cosmos DB during sidecar initialization.

Problem

Some sidecars with Azure Cosmos DB components (both Output Binding and State Store) fail to initialize, causing the sidecar to restart or hang (up to the initTimeout duration which is 5 seconds by default). Sidecar logs show a response from Cosmos DB with status 429 Request rate too large.

Root cause

Every new connection to Azure Cosmos DB initially performs a large number of metadata requests (Cosmos DB documentation). The metadata request rate limit for Azure Cosmos DB Accounts can easily be exceeded when multiple connections to the same Azure Cosmos DB account (even distinct datatabases within the same account) are made simultaneously. If the attempt to initialize the component and connect to Azure Cosmos DB fails due to this rate limit Dapr did not previously retry the connection until the sidecar restarts.

Solution

The following production best practices must be applied to minimize the likelihood this issue will occur:

  • Ensure applications and sidecars only load the Azure Cosmos DB component when it is required for that application, which avoid unnecessary database connections from other microservices or applications. This can be done by scoping your components to specific applications.
  • Choose deployment strategies that sequentially deploy or start all applications to minimize bursts in new connections to your Azure Cosmos DB accounts.
  • Avoid reusing the same Azure Cosmos DB account for unrelated databases or systems (even outside of Dapr). Distinct Azure Cosmos DB accounts have distinct rate limits.

Additionally, Dapr now retries establishing the initial Azure Cosmos DB connection for up to 5 minutes, when the component is configured by setting the initTimeout value.

The default component initialization timeout is 5 seconds. Please update your component definitions specifying a greater initTimeout duration value for the component to attempt reconnections. Note that this will delay the readiness of your sidecar. In Kubernetes you may need to adjust your liveness and readiness probes.

Add scopes to Pub/Sub subscription conversion from v1alpha1 to v2alpha1

Overview

In order to add the Pub/Sub routing preview feature, the subscriptions CRD required a new version v2alpha1. This requires the Dapr operator to provide a conversion webhook that converts v1alpha1 to v2alpha1.

Problem

Subscriptions created as v1alpha1 would drop the scopes when converted by the operator to v2alpha1.

Root cause

The conversion function was not copying the Scopes field from v1apha1 to v2alpha1.

Solution

The conversion function was updated to include copying the Scopes field.

Upgrade github.com/nhooyr/websocket to > 1.8.6 with DoS vulnerability fixes

Overview

From https://security.snyk.io/vuln/SNYK-GOLANG-GITHUBCOMNHOOYRWEBSOCKET-1244800:

github.com/nhooyr/websocket is a minimal and idiomatic WebSocket library for Go.

Affected versions of this package are vulnerable to Denial of Service (DoS). A double channel close panic is possible if a peer sent back multiple pongs for every ping. If the second pong arrived before the ping goroutine deleted its channel from the map, the channel would be closed twice and a panic would occur.

Root cause

Dapr was using v1.8.6 of the package.

Solution

The package was upgrade to v1.8.7.

Fixing race conditions in RabbitMQ Pub/Sub component

Overview

The implementation of RabbitMQ Pub/Sub component was not robust, especially when an application used multiple publishers/subscribers.

Root cause

Concurrent calls to the underlying AMQP layer caused race condition and failures in publishing and subscribing

Solution

This hotfix extends thread synchronization to span across entire "pubish" and "subscribe" operations. This eliminates race conditions between publishers and subscribers during access to the underlying AMQP layer. In addition this hotfix adds a comprehensive certification test for RabbitMQ Pub/Sub component. A successful completion of the test indicates that the component has reached the stable state.

Fixing crash on subscribe in Configuration API Preview

Overview

The SubscribeConfigurationAlpha1 method is used to subscribe to configuration updates. Invoking the SubscribeConfigurationAlpha1 method causes Dapr to crash.

Root cause

The implementation had an uninitialized map which would cause a nil-memory panic.

Solution

The fix initializes the map properly and allows apps to be notified about changes to a configuration item.

Fixing connection leak for gRPC connection to Dapr Operator

Overview

When a large number of pods are destroyed frequently, the server side needs to actively detect whether the connection is available to prevent the leakage of operator channel resources.

Error message can look like:

txt
time="..." level=warning msg="error updating sidecar with component ...: rpc error: code = Unavailable desc = transport is closing" instance=dapr-operator-abc-xyz scope=dapr.operator.api type=log ver=unknown

Root cause

Dapr operator does not close stale connections.

Solution

Dapr operator actively detects whether grpc connection to sidecar is active.

[CLI] Fixing hanging shutdown calls when using Unix Domain Socket

Overview

When using Unix Domain Sockets the Shutdown call hangs requiring the app to be manually terminated.

Root Cause

The library used in the implementation of this feature did not automatically close idle HTTP connections.

Solution

Added shutdown logic to explicitly close idle HTTP connections.

Applications using the 1.4.x release of Dapr should consider taking the related 1.4.4 release (or higher) that addresses a similar set of issues.