Trusted Cloud by S3NS APIs use long-running operations (LROs) for calls expected to
take significant time to complete (for example, provisioning a
Compute Engine instance or initializing a Dataflow pipeline).
These APIs don't keep an active long-lived connection or block while the task
runs. For LRO APIs, the Cloud Client Libraries for Java returns a future
for you to
check later.
Determining if an API is an LRO
There are two main ways to determine if an API is an LRO:
- LRO APIs either have the suffix
Async
(for example,createClusterAsync
) orOperationCallable
(for example,createClusterOperationCallable
). - LRO APIs return either an
OperationFuture
orOperationCallable
.
The following snippet shows the two variations, using Java-Dataproc
as an
example:
// Async suffix (#1) returns OperationFuture (#2)
public final OperationFuture<Cluster, ClusterOperationMetadata> createClusterAsync(CreateClusterRequest request)
// OperationCallable suffix (#1) returns OperationCallable (#2)
public final OperationCallable<CreateClusterRequest, Cluster, ClusterOperationMetadata> createClusterOperationCallable()
These are two variations for the same API and not two different APIs
(both calls create a Dataproc cluster). The Async
variant is
recommended.
High-level flow of an LRO
LRO APIs are essentially an initial request call followed by a series of small polling calls. The initial call sends the request and creates an "operation" on the server. All subsequent polling calls to the server track the status of the operation. If the operation is finished, the response is returned. Otherwise, an incomplete status is returned and the client library determines whether to poll again.
By default, the client handles the polling logic, and you don't need to configure the polling mechanism unless you have specific requirements.
From your perspective, the call runs in the background until a response is received. The polling calls and timeout configurations have default values that are pre-configured by the service team based on the expected time for their APIs. These configurations control many factors, such as how often to poll and how long to wait before giving up.
The Cloud Client Libraries for Java provide an interface for interacting with the LRO
using OperationFuture
.
The following snippet shows how to call an operation and to wait for a response,
using Java-Dataproc
as an example:
try (ClusterControllerClient clusterControllerClient = ClusterControllerClient.create()) {
CreateClusterRequest request =
CreateClusterRequest.newBuilder().build();
OperationFuture<Cluster, ClusterOperationMetadata> future =
clusterControllerClient.createClusterAsync(request);
// Blocks until there is a response
Cluster response = future.get();
} catch (CancellationException e) {
// Exceeded the timeout without the Operation completing.
// Library is no longer polling for the Operation's status.
}
Default LRO values
You can find the default values within each client's StubSettings
class. The
initDefaults()
method initializes the LRO settings inside the nested
Builder
class.
For example, in Java-Aiplatform v3.24.0
, the deployModel
LRO call has the
following default parameters:
OperationTimedPollAlgorithm.create(
RetrySettings.newBuilder()
.setInitialRetryDelayDuration(Duration.ofMillis(5000L))
.setRetryDelayMultiplier(1.5)
.setMaxRetryDelayDuration(Duration.ofMillis(45000L))
.setTotalTimeoutDuration(Duration.ofMillis(300000L))
.setInitialRpcTimeoutDuration(Duration.ZERO) // not used
.setRpcTimeoutMultiplier(1.0) // not used
.setMaxRpcTimeoutDuration(Duration.ZERO) // not used
.build()));
Both retries and LROs share the same RetrySettings
class. The following table
shows the mapping between the fields inside RetrySettings
and the LRO
functionality:
RetrySettings | Description |
---|---|
InitialRetryDelay | Initial delay before the first poll. |
MaxRetryDelay | Maximum delay between each poll. |
RetryDelayMultiplier | Multiplier for the poll retry delay between polls. |
TotalTimeoutDuration | Maximum time allowed for the long-running operation. |
When to configure LRO values
The main use case to manually configure the LRO values is to modify polling frequencies due to LRO timeouts. While the default values are configured as an estimate by the service team, certain factors might result in occasional timeouts.
To reduce the number of timeouts, increase the total timeout value. Increasing the other values can also help, and you should test them to ensure the expected behavior.
How to configure LRO values
To configure the LRO values, create an OperationTimedPollAlgorithm
object and
update the polling algorithm for a specific LRO. The following snippet uses
Java-Dataproc
as an example:
ClusterControllerSettings.Builder settingsBuilder = ClusterControllerSettings.newBuilder();
// Create a new OperationTimedPollAlgorithm object
TimedRetryAlgorithm timedRetryAlgorithm = OperationTimedPollAlgorithm.create(
RetrySettings.newBuilder()
.setInitialRetryDelayDuration(Duration.ofMillis(500L))
.setRetryDelayMultiplier(1.5)
.setMaxRetryDelayDuration(Duration.ofMillis(5000L))
.setTotalTimeoutDuration(Duration.ofHours(24L))
.build());
// Set the new polling settings for the specific LRO API
settingsBuilder.createClusterOperationSettings().setPollingAlgorithm(timedRetryAlgorithm);
ClusterControllerClient clusterControllerClient = ClusterControllerClient.create(settingsBuilder.build());
This configuration only modifies the LRO values for the createClusterOperation
RPC. The other RPCs in the Client still use the pre-configured LRO
values for each RPC unless also modified.
LRO timeouts
The library continues to poll as long as the total timeout has not been
exceeded. If the total timeout has exceeded, the library throws a
java.util.concurrent.CancellationException
with the message "Task was
cancelled."
A CancellationException
doesn't mean that the backend Trusted Cloud by S3NS
task was cancelled. This exception is thrown from the client library when a
call has exceeded the total timeout and has not received a response. Even if
the task is completed immediately after the timeout, the response won't be
seen by the client library.