docs/release_notes/v1.12.4.md
This update includes bug fixes:
When restarting a significant deployment on Kubernetes (> 20 pods), many pods will be removed from placement table. When the restarts happens concurrently with leadership changes in placement service, the placement table can get into a corrupt state that requires the restart of placement service #7311
Impacts users running Dapr 1.12.0-1.12.3 that uses actors and deployments with many sidecar instances.
Race condition between placement leadership changes and placement table updates due to pods being terminated.
Mitigate by reducing the chances of leadership changes in placement service and fix a bug that can cause terminated pods to remain in placement table forever.
When performing service invocation to another app using Dapr (using either HTTP or gRPC), the caller sidecar establishes a gRPC connection with the target sidecar. When the target app is deployed with multiple replicas (scaled horizontally), a connection is established with each replica and requests are load-balanced across all replicas.
In Dapr 1.12.3 and lower running on Kubernetes, due to the way this logic was implemented, if the connection with one of the replicas became idle, it would have caused the connection between the caller sidecar and all replicas to be severed, possibly interrupting other in-flight service invocation calls.
This issue impacts users running Dapr 1.12.3 and lower, running on Kubernetes, that use service invocation to invoke apps that are scaled horizontally.
This issue does not impact users that are running outside of Kubernetes or who are using other Dapr nameresolution components (including mDNS or Consul).
On Kubernetes, the gRPC connection was established with the DNS name of the target Service, and we allowed the gRPC-Go library to perform name resolution with the Kubernetes DNS server and establish a connection with each replica. When the connection with one of the replicas becomes idle, the caller app receives a "GOAWAY" message which is interpreted as a signal to terminate all connections with all replicas of the app.
We have changed the internals of service invocation, so we always perform DNS resolution in the caller Dapr sidecar, also on Kubernetes. The gRPC-Go library receives an individual IP address to connect to, so "GOAWAY" messages only impact the connection with an individual replica.