Cloud Client Libraries for Java uses retries to handle unexpected, transient failures (that is, the server is temporarily unavailable). Multiple attempts can result in a successful response from the server.
Default retry values are selected by the team operating the cloud service. These retry values are configured per RPC. A service can choose to only enable retries for a subset of RPCs. It is possible that each RPC for a service is configured differently.
Retry parameters
Client libraries have two types of retry parameters to configure:
- Retry Status Code: Set of status codes to retry on.
- Retry Timeout or Attempt Bounds: Configurable RetrySettings to define the bounds.
Default RPC Retry configuration location
The default retry configurations are defined in the generated
{Client}StubSettings
file. Using the ExportAssets RPC in
Java-Asset v3.64.0 as an example, the default retry configurations are defined
in the following places:
Retry Status Codes: configured in the
AssetServiceStubSettings.java
file. Example:ImmutableMap.Builder<String, ImmutableSet<StatusCode.Code>> definitions = ImmutableMap.builder(); definitions.put("no_retry_0_codes", ImmutableSet.copyOf(Lists.<StatusCode.Code>newArrayList())); // ... More StatusCode configurations RETRYABLE_CODE_DEFINITIONS = definitions.build();
Retry Parameters: Configured in the
AssetServiceStubSettings.java
file. Example:ImmutableMap.Builder<String, RetrySettings> definitions = ImmutableMap.builder(); RetrySettings settings = null; settings = RetrySettings.newBuilder() .setInitialRpcTimeoutDuration(Duration.ofMillis(60000L)) .setRpcTimeoutMultiplier(1.0) .setMaxRpcTimeoutDuration(Duration.ofMillis(60000L)) .setTotalTimeoutDuration(Duration.ofMillis(60000L)) .build(); definitions.put("no_retry_0_params", settings); // ... More RetrySettings configurations RETRY_PARAM_DEFINITIONS = definitions.build();
Both configurations are mapped to the RPC in the
AssetServiceStubSettings.java
file. Example:
builder
.exportAssetsSettings()
.setRetryableCodes(RETRYABLE_CODE_DEFINITIONS.get("no_retry_0_codes"))
.setRetrySettings(RETRY_PARAM_DEFINITIONS.get("no_retry_0_params"));
Client library retry concepts
Enabling retries allows an RPC multiple attempts to try and achieve a successful
call. A successful call is a response from a server that returns an OK
Status Code (from gRPC) or a 2xx
Status Code (from HttpJson
).
Attempt versus operation
The following RetrySettings
configuration modifies the retry settings for both
an RPC's attempt and operation:
settings =
RetrySettings.newBuilder()
.setInitialRetryDelayDuration(Duration.ofMillis(100L))
.setRetryDelayMultiplier(1.3)
.setMaxRetryDelayDuration(Duration.ofMillis(60000L))
.setInitialRpcTimeoutDuration(Duration.ofMillis(60000L))
.setRpcTimeoutMultiplier(1.0)
.setMaxRpcTimeoutDuration(Duration.ofMillis(60000L))
.setTotalTimeoutDuration(Duration.ofMillis(60000L))
.build();
An RPC attempt is the individual attempt made and an RPC operation is a collection of all attempts made. A single RPC invocation will have one or more attempts in a single operation.
Individual RPC Bounds (an attempt) are controlled by the following settings:
setInitialRetryDelayDuration
: Delay before the first attempt.setRetryDelayMultiplier
: Delay multiplier applied between each attempt.setMaxRetryDelayDuration
: Maximum delay possible for an attempt.setInitialRpcTimeoutDuration
: Timeout for the first attempt.setRpcTimeoutMultiplier
: Timeout multiplier applied between each attempt.setMaxRpcTimeoutDuration
: Maximum timeout possible for an attempt.
Total RPC Bounds (an operation) are controlled by the following settings:
setTotalTimeout
: Total timeout allowed the entire operation.setMaxAttempts
: Max number of attempts allowed.
When an RPC is retried
An RPC will be retried when both of the following scenarios occur:
- A non-successful status code is received by the library and the status code is marked that it can be tried again.
- An RPC invocation exceeds the individual RPC bounds, but still falls within total RPC bounds.
If only one scenario is true, or if neither scenario is true, then the RPC won't be retried.
For example, if the total timeout has not been exceeded, but the latest attempt receives a status code that can't be retried.
Additionally, when configuring the RPC bounds, you can configure the bounds for each attempt as well as the total RPC's bounds. The retry algorithm will ensure that the bounds of an individual attempt falls within the total RPC's bounds.
Exponential backoff
Exponential backoff will retry requests with an increasing delay between each retry attempt. This retry delay value can be capped with a maximum retry delay value.
For example, the following retry configurations can result in the following delay times:
Initial Retry Delay: 100ms
Retry Delay Multiplier: 2.0
Max Retry Delay: 500ms
- Attempt 1: Delay 100ms
- Attempt 2: Delay 200ms
- Attempt 3: Delay 400ms
- Attempt 4: Delay 500ms
- ...
- Attempt X: Delay 500ms
Jitter
Jitter is added variance using randomness to spread out when the RPCs are invoked. Trusted Cloud by S3NS Client Libraries always enable jitter for retries. This helps spread out the retry attempts without overwhelming the server.
The jitter random value is computed based on the retry delay. Before each
attempt, the retry algorithm will compute a random value between
[1, RETRY_DELAY]
. This computed value is the approximate delay before the
request is sent to the server.
The following retry configurations utilizes jitter and exponential backoff.
Initial Retry Delay: 100ms
Retry Delay Multiplier: 2.0
Max Retry Delay: 500ms
This could result in the following delay times:
- Attempt 1: Delay a random value between
[1, 100]
ms - Attempt 2: Delay a random value between
[1, 200]
ms - Attempt 3: Delay a random value between
[1, 400]
ms - Attempt 4: Delay a random value between
[1, 500]
ms - ...
- Attempt X: Delay a random value between
[1, 500]
ms
Retry examples
The following examples show the behavior of some retry configurations.
No retry
This example disables retries.
RetrySettings defaultNoRetrySettings =
RetrySettings.newBuilder()
// Use the default configurations for other settings
.setTotalTimeoutDuration(Duration.ofMillis(5000L))
// Explicitly set retries as disabled (maxAttempts == 1)
.setMaxAttempts(1)
.build();
Alternatively, this behavior can be configured with this example:
RetrySettings defaultNoRetrySettings =
RetrySettings.newBuilder()
.setLogicalTimeoutDuration(Duration.ofMillis(5000L))
.build();
The following table shows the attempts:
Attempt Number | RPC Timeout | Retry Delay | Call Invoked | Call Ended |
---|---|---|---|---|
1 | 5000ms | 0ms | 0ms | 5000ms |
Retry example
This example enables retries with specified delays and timeouts.
RetrySettings.newBuilder()
.setInitialRetryDelayDuration(Duration.ofMillis(200L))
.setRetryDelayMultiplier(2.0)
.setMaxRetryDelayDuration(Duration.ofMillis(500L))
.setInitialRpcTimeoutDuration(Duration.ofMillis(1500L))
.setRpcTimeoutMultiplier(2.0)
.setMaxRpcTimeoutDuration(Duration.ofMillis(3000L))
.setTotalTimeoutDuration(Duration.ofMillis(5000L))
.build();
The following table shows the attempts:
Attempt Number | RPC Timeout | Retry Delay | Call Invoked | Call Ended |
---|---|---|---|---|
1 | 1500ms | 0ms | 0ms | 1500ms |
2 (Retry) | 3000ms | 200ms | 1700ms | 4700ms |
3 (Retry Not Attempted) | - | 400ms | - | - |
Retry example: longer total timeout
This example is similar to the first retry example, but has a longer total timeout to showcase an additional retry attempt and the capped RPC Timeout for the last retry attempt.
RetrySettings.newBuilder()
.setInitialRetryDelayDuration(Duration.ofMillis(200L))
.setRetryDelayMultiplier(2.0)
.setMaxRetryDelayDuration(Duration.ofMillis(500L))
.setInitialRpcTimeoutDuration(Duration.ofMillis(1500L))
.setRpcTimeoutMultiplier(2.0)
.setMaxRpcTimeoutDuration(Duration.ofMillis(3000L))
.setTotalTimeoutDuration(Duration.ofMillis(10000L))
.build();
The following table shows the attempts:
Attempt Number | RPC Timeout | Retry Delay | Call Invoked | Call Ended |
---|---|---|---|---|
1 | 1500ms | 0ms | 0ms | 1500ms |
2 (Retry) | 3000ms | 200ms | 1700ms | 4700ms |
3 (Retry) | 4900ms | 400ms | 5100ms | 10000ms |
The third retry RPC Timeout value is limited due to the Total Timeout value. Using the multiplier (2.0) with the previous timeout value (3000ms) results in an RPC Timeout of 6000ms. However, the RPC Timeout shouldn't exceed the Total Timeout and is reduced to be the "time left" (10000 - 5100 = 4900).
Retry example: capped RPC timeout
RetrySettings defaultRetrySettings =
RetrySettings.newBuilder()
.setInitialRetryDelayDuration(Duration.ofMillis(200L))
.setRetryDelayMultiplier(2.0)
.setMaxRetryDelayDuration(Duration.ofMillis(500L))
.setInitialRpcTimeoutDuration(Duration.ofMillis(500L))
.setRpcTimeoutMultiplier(2.0)
.setMaxRpcTimeoutDuration(Duration.ofMillis(2000L))
.setTotalTimeoutDuration(Duration.ofMillis(4000L))
.build();
The following table shows the attempts:
Attempt Number | RPC Timeout | Retry Delay | Call Invoked | Call Ended |
---|---|---|---|---|
1 | 500ms | 0ms | 0ms | 500ms |
2 (Retry) | 1000ms | 200ms | 700ms | 1700ms |
3 (Retry) | 1900ms | 400ms | 2100ms | 4000ms |
Another example where the RPC Timeout is capped to not exceed the total timeout.
How to configure a custom retry parameters for an RPC
The following example uses the Java-Asset client library:
Create the
RetrySettings
class with your custom configurations:RetrySettings customRetrySettings = RetrySettings.newBuilder() // ... Retry Configurations .build(); RetrySettings customRetrySettings2 = RetrySettings.newBuilder() // ... Retry Configurations .build();
Create the
StubSettings.Builder
for your client and configure it for the RPC:AssetServiceStubSettings.Builder assetStubSettingsBuilder = AssetServiceStubSettings.newBuilder(); assetStubSettingsBuilder .exportAssetsSettings() // Set your custom Retry Settings .setRetrySettings(customRetrySettings) // Set your custom Retryable Codes .setRetryableCodes(ImmutableSet.of(StatusCode.Code.DEADLINE_EXCEEDED));
The code snippet provided is setting custom retry configurations for
AssetServiceClient
'sExportAssets
RPC. It configures theExportAssets
RPC to use the retry settings configured incustomRetrySettings
and sets the codes that can be retried to beDEADLINE_EXCEEDED
.Create the settings for the client as
assetSettings
:java AssetServiceSettings assetSettings = AssetServiceSettings.create(assetStubSettingsBuilder.build());
4.Create the client with the settings asassetClient
.java try (AssetServiceClient assetClient = AssetServiceClient.create(assetSettings)) { ... }
Repeat step 2 for each RPC that you want to configure. For example:
AssetServiceStubSettings.Builder assetStubSettingsBuilder = AssetServiceStubSettings.newBuilder();
// Modify the retry params for ExportAssets RPC
assetStubSettingsBuilder
.exportAssetsSettings()
.setRetrySettings(customRetrySettings)
.setRetryableCodes(ImmutableSet.of(StatusCode.Code.DEADLINE_EXCEEDED));
// Modify the retry params for ListAssets RPC
assetStubSettingsBuilder
.listAssetsSettings()
.setRetrySettings(customRetrySettings2)
.setRetryableCodes(ImmutableSet.of(StatusCode.Code.UNAVAILABLE));
FAQ
The following are commonly asked questions regarding client retry behavior.
I expected X retry attempts, but it attempted Y times.
Unless you explicitly specify the number of max attempts (along with disabling the timeout configurations), you might not consistently see the same number of retry attempts made. Jitter random values for RPC delay make it difficult to predict when the request is actually sent.
The RPC returned a failure before the Total Timeout value was reached.
The retry algorithm will calculate the jittered retry delay value during each
retry attempt. The calculated retry delay will be scheduled to run in the future
(that is, currentTime() + jitteredRetryDelay
). If the scheduled attempt time
exceeds the total timeout, the final retry attempt won't be made.
I configured custom settings and am seeing quota issues.
You may have configured RetrySettings
to run too aggressively. The default
retry values are chosen by the team operating the service.
Consider increasing the retry delay (initial retry delay and retry multiplier) so that the retry attempts are spaced out and less frequent. Note that this can result in a slower response.
Your use case can require a quicker response or more frequent retry attempts, or both. If that is the case, try to increase the quota limits.