Class GkeInferenceQuickstartClient (3.2.0-rc)

GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators.

These profiles help generate optimized best practices for running inference on GKE.

Equality

Instances of this class created via copy-construction or copy-assignment always compare equal. Instances created with equal std::shared_ptr<*Connection> objects compare equal. Objects that compare equal share the same underlying resources.

Performance

Creating a new instance of this class is a relatively expensive operation, new objects establish new connections to the service. In contrast, copy-construction, move-construction, and the corresponding assignment operations are relatively efficient as the copies share all underlying resources.

Thread Safety

Concurrent access to different instances of this class, even if they compare equal, is guaranteed to work. Two or more threads operating on the same instance of this class is not guaranteed to work. Since copy-construction and move-construction is a relatively efficient operation, consider using such a copy when using this class from multiple threads.

Constructors

GkeInferenceQuickstartClient(GkeInferenceQuickstartClient const &)

Copy and move support

Parameter
Name Description
GkeInferenceQuickstartClient const &

GkeInferenceQuickstartClient(GkeInferenceQuickstartClient &&)

Copy and move support

Parameter
Name Description
GkeInferenceQuickstartClient &&

GkeInferenceQuickstartClient(std::shared_ptr< GkeInferenceQuickstartConnection >, Options)

Parameters
Name Description
connection std::shared_ptr< GkeInferenceQuickstartConnection >
opts Options

Operators

operator=(GkeInferenceQuickstartClient const &)

Copy and move support

Parameter
Name Description
GkeInferenceQuickstartClient const &
Returns
Type Description
GkeInferenceQuickstartClient &

operator=(GkeInferenceQuickstartClient &&)

Copy and move support

Parameter
Name Description
GkeInferenceQuickstartClient &&
Returns
Type Description
GkeInferenceQuickstartClient &

Functions

FetchModels(google::cloud::gkerecommender::v1::FetchModelsRequest, Options)

Fetches available models.

Open-source models follow the Huggingface Hub owner/model_name format.

Parameters
Name Description
request google::cloud::gkerecommender::v1::FetchModelsRequest

Unary RPCs, such as the one wrapped by this function, receive a single request proto message which includes all the inputs for the RPC. In this case, the proto message is a google.cloud.gkerecommender.v1.FetchModelsRequest. Proto messages are converted to C++ classes by Protobuf, using the Protobuf mapping rules.

opts Options

Optional. Override the class-level options, such as retry and backoff policies.

Returns
Type Description
StreamRange< std::string >

a StreamRange to iterate of the results. See the documentation of this type for details. In brief, this class has begin() and end() member functions returning a iterator class meeting the input iterator requirements. The value type for this iterator is a StatusOr as the iteration may fail even after some values are retrieved successfully, for example, if there is a network disconnect. An empty set of results does not indicate an error, it indicates that there are no resources meeting the request criteria. On a successful iteration the StatusOr<T> contains a std::string.

FetchModelServers(google::cloud::gkerecommender::v1::FetchModelServersRequest, Options)

Fetches available model servers.

Open-source model servers use simplified, lowercase names (e.g., vllm).

Parameters
Name Description
request google::cloud::gkerecommender::v1::FetchModelServersRequest

Unary RPCs, such as the one wrapped by this function, receive a single request proto message which includes all the inputs for the RPC. In this case, the proto message is a google.cloud.gkerecommender.v1.FetchModelServersRequest. Proto messages are converted to C++ classes by Protobuf, using the Protobuf mapping rules.

opts Options

Optional. Override the class-level options, such as retry and backoff policies.

Returns
Type Description
StreamRange< std::string >

a StreamRange to iterate of the results. See the documentation of this type for details. In brief, this class has begin() and end() member functions returning a iterator class meeting the input iterator requirements. The value type for this iterator is a StatusOr as the iteration may fail even after some values are retrieved successfully, for example, if there is a network disconnect. An empty set of results does not indicate an error, it indicates that there are no resources meeting the request criteria. On a successful iteration the StatusOr<T> contains a std::string.

FetchModelServerVersions(google::cloud::gkerecommender::v1::FetchModelServerVersionsRequest, Options)

Fetches available model server versions.

Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

Parameters
Name Description
request google::cloud::gkerecommender::v1::FetchModelServerVersionsRequest

Unary RPCs, such as the one wrapped by this function, receive a single request proto message which includes all the inputs for the RPC. In this case, the proto message is a google.cloud.gkerecommender.v1.FetchModelServerVersionsRequest. Proto messages are converted to C++ classes by Protobuf, using the Protobuf mapping rules.

opts Options

Optional. Override the class-level options, such as retry and backoff policies.

Returns
Type Description
StreamRange< std::string >

a StreamRange to iterate of the results. See the documentation of this type for details. In brief, this class has begin() and end() member functions returning a iterator class meeting the input iterator requirements. The value type for this iterator is a StatusOr as the iteration may fail even after some values are retrieved successfully, for example, if there is a network disconnect. An empty set of results does not indicate an error, it indicates that there are no resources meeting the request criteria. On a successful iteration the StatusOr<T> contains a std::string.

FetchProfiles(google::cloud::gkerecommender::v1::FetchProfilesRequest, Options)

Fetches available profiles.

A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.

Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

Parameters
Name Description
request google::cloud::gkerecommender::v1::FetchProfilesRequest

Unary RPCs, such as the one wrapped by this function, receive a single request proto message which includes all the inputs for the RPC. In this case, the proto message is a google.cloud.gkerecommender.v1.FetchProfilesRequest. Proto messages are converted to C++ classes by Protobuf, using the Protobuf mapping rules.

opts Options

Optional. Override the class-level options, such as retry and backoff policies.

Returns
Type Description
StreamRange< google::cloud::gkerecommender::v1::Profile >

a StreamRange to iterate of the results. See the documentation of this type for details. In brief, this class has begin() and end() member functions returning a iterator class meeting the input iterator requirements. The value type for this iterator is a StatusOr as the iteration may fail even after some values are retrieved successfully, for example, if there is a network disconnect. An empty set of results does not indicate an error, it indicates that there are no resources meeting the request criteria. On a successful iteration the StatusOr<T> contains elements of type google.cloud.gkerecommender.v1.Profile, or rather, the C++ class generated by Protobuf from that type. Please consult the Protobuf documentation for details on the Protobuf mapping rules.

GenerateOptimizedManifest(google::cloud::gkerecommender::v1::GenerateOptimizedManifestRequest const &, Options)

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations.

Parameters
Name Description
request google::cloud::gkerecommender::v1::GenerateOptimizedManifestRequest const &

Unary RPCs, such as the one wrapped by this function, receive a single request proto message which includes all the inputs for the RPC. In this case, the proto message is a google.cloud.gkerecommender.v1.GenerateOptimizedManifestRequest. Proto messages are converted to C++ classes by Protobuf, using the Protobuf mapping rules.

opts Options

Optional. Override the class-level options, such as retry and backoff policies.

Returns
Type Description
StatusOr< google::cloud::gkerecommender::v1::GenerateOptimizedManifestResponse >

the result of the RPC. The response message type (google.cloud.gkerecommender.v1.GenerateOptimizedManifestResponse) is mapped to a C++ class using the Protobuf mapping rules. If the request fails, the StatusOr contains the error details.

FetchBenchmarkingData(google::cloud::gkerecommender::v1::FetchBenchmarkingDataRequest const &, Options)

Fetches all of the benchmarking data available for a profile.

Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Parameters
Name Description
request google::cloud::gkerecommender::v1::FetchBenchmarkingDataRequest const &

Unary RPCs, such as the one wrapped by this function, receive a single request proto message which includes all the inputs for the RPC. In this case, the proto message is a google.cloud.gkerecommender.v1.FetchBenchmarkingDataRequest. Proto messages are converted to C++ classes by Protobuf, using the Protobuf mapping rules.

opts Options

Optional. Override the class-level options, such as retry and backoff policies.

Returns
Type Description
StatusOr< google::cloud::gkerecommender::v1::FetchBenchmarkingDataResponse >

the result of the RPC. The response message type (google.cloud.gkerecommender.v1.FetchBenchmarkingDataResponse) is mapped to a C++ class using the Protobuf mapping rules. If the request fails, the StatusOr contains the error details.