aspnetcore/grpc/retries.md
gRPC retries is a feature that allows gRPC clients to automatically retry failed calls. This article discusses how to configure a retry policy to make resilient, fault tolerant gRPC apps in .NET.
gRPC retries requires Grpc.Net.Client version 2.36.0 or later.
gRPC calls can be interrupted by transient faults. Transient faults include:
When a gRPC call is interrupted, the client throws an RpcException with details about the error. The client app must catch the exception and choose how to handle the error.
var client = new Greeter.GreeterClient(channel);
try
{
var response = await client.SayHelloAsync(
new HelloRequest { Name = ".NET" });
Console.WriteLine("From server: " + response.Message);
}
catch (RpcException ex)
{
// Write logic to inspect the error and retry
// if the error is from a transient fault.
}
Duplicating retry logic throughout an app is verbose and error prone. Fortunately the .NET gRPC client has a built-in support for automatic retries.
A retry policy is configured once when a gRPC channel is created:
var defaultMethodConfig = new MethodConfig
{
Names = { MethodName.Default },
RetryPolicy = new RetryPolicy
{
MaxAttempts = 5,
InitialBackoff = TimeSpan.FromSeconds(1),
MaxBackoff = TimeSpan.FromSeconds(5),
BackoffMultiplier = 1.5,
RetryableStatusCodes = { StatusCode.Unavailable }
}
};
var channel = GrpcChannel.ForAddress("https://localhost:5001", new GrpcChannelOptions
{
ServiceConfig = new ServiceConfig { MethodConfigs = { defaultMethodConfig } }
});
The preceding code:
MethodConfig. Retry policies can be configured per-method and methods are matched using the Names property. This method is configured with MethodName.Default, so it's applied to all gRPC methods called by this channel.Unavailable.GrpcChannelOptions.ServiceConfig.gRPC clients created with the channel will automatically retry failed calls:
var client = new Greeter.GreeterClient(channel);
var response = await client.SayHelloAsync(
new HelloRequest { Name = ".NET" });
Console.WriteLine("From server: " + response.Message);
Calls are retried when:
RetryableStatusCodes.MaxAttempts.A gRPC call becomes committed in two scenarios:
ServerCallContext.WriteResponseHeadersAsync is called, or when the first message is written to the server response stream.MaxRetryBufferSize and MaxRetryBufferPerCallSize are configured on the channel.Committed calls won't retry, regardless of the status code or the previous number of attempts.
Streaming calls can be used with gRPC retries, but there are important considerations when they are used together:
For more information, see When retries are valid.
The backoff delay between retry attempts is configured with InitialBackoff, MaxBackoff, and BackoffMultiplier. More information about each option is available in the gRPC retry options section.
The actual delay between retry attempts is randomized. A randomized delay between 0 and the current backoff determines when the next retry attempt is made. Consider that even with exponential backoff configured, increasing the current backoff between attempts, the actual delay between attempts isn't always larger. The delay is randomized to prevent retries from multiple calls from clustering together and potentially overloading the server.
gRPC retries can be detected by the presence of grpc-previous-rpc-attempts metadata. The grpc-previous-rpc-attempts metadata:
Consider the following retry scenario:
grpc-previous-rpc-attempts metadata has a value of 1. Metadata is sent to the server with the retry.grpc-previous-rpc-attempts is in the response metadata and has a value of 1.The grpc-previous-rpc-attempts metadata is not present on the initial gRPC call, is 1 for the first retry, 2 for the second retry, and so on.
The following table describes options for configuring gRPC retry policies:
| Option | Description |
|---|---|
MaxAttempts | The maximum number of call attempts, including the original attempt. This value is limited by GrpcChannelOptions.MaxRetryAttempts which defaults to 5. A value is required and must be greater than 1. |
InitialBackoff | The initial backoff delay between retry attempts. A randomized delay between 0 and the current backoff determines when the next retry attempt is made. After each attempt, the current backoff is multiplied by BackoffMultiplier. A value is required and must be greater than zero. |
MaxBackoff | The maximum backoff places an upper limit on exponential backoff growth. A value is required and must be greater than zero. |
BackoffMultiplier | The backoff will be multiplied by this value after each retry attempt and will increase exponentially when the multiplier is greater than 1. A value is required and must be greater than zero. |
RetryableStatusCodes | A collection of status codes. A gRPC call that fails with a matching status will be automatically retried. For more information about status codes, see Status codes and their use in gRPC. At least one retryable status code is required. |
Hedging is an alternative retry strategy. Hedging enables aggressively sending multiple copies of a single gRPC call without waiting for a response. Hedged gRPC calls may be executed multiple times on the server and the first successful result is used. It's important that hedging is only enabled for methods that are safe to execute multiple times without adverse effect.
Hedging has pros and cons when compared to retries:
A hedging policy is configured like a retry policy. Note that a hedging policy can't be combined with a retry policy.
var defaultMethodConfig = new MethodConfig
{
Names = { MethodName.Default },
HedgingPolicy = new HedgingPolicy
{
MaxAttempts = 5,
NonFatalStatusCodes = { StatusCode.Unavailable }
}
};
var channel = GrpcChannel.ForAddress("https://localhost:5001", new GrpcChannelOptions
{
ServiceConfig = new ServiceConfig { MethodConfigs = { defaultMethodConfig } }
});
The following table describes options for configuring gRPC hedging policies:
| Option | Description |
|---|---|
MaxAttempts | The hedging policy will send up to this number of calls. MaxAttempts represents the total number of all attempts, including the original attempt. This value is limited by GrpcChannelOptions.MaxRetryAttempts which defaults to 5. A value is required and must be 2 or greater. |
HedgingDelay | The first call is sent immediately, subsequent hedging calls are delayed by this value. When the delay is set to zero or null, all hedged calls are sent immediately. HedgingDelay is optional and defaults to zero. A value must be zero or greater. |
NonFatalStatusCodes | A collection of status codes which indicate other hedge calls may still succeed. If a non-fatal status code is returned by the server, hedged calls will continue. Otherwise, outstanding requests will be canceled and the error returned to the app. For more information about status codes, see Status codes and their use in gRPC. |