Reference documentation and code samples for the GKE Recommender V1 API class Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.
Client for the GkeInferenceQuickstart service.
GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators. These profiles help generate optimized best practices for running inference on GKE.
Inherits
- Object
Methods
.configure
def self.configure() { |config| ... } -> Client::Configuration
Configure the GkeInferenceQuickstart Client class.
See Configuration for a description of the configuration fields.
- (config) — Configure the Client client.
- config (Client::Configuration)
# Modify the configuration for all GkeInferenceQuickstart clients ::Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.configure do |config| config.timeout = 10.0 end
#configure
def configure() { |config| ... } -> Client::Configuration
Configure the GkeInferenceQuickstart Client instance.
The configuration is set to the derived mode, meaning that values can be changed, but structural changes (adding new fields, etc.) are not allowed. Structural changes should be made on Client.configure.
See Configuration for a description of the configuration fields.
- (config) — Configure the Client client.
- config (Client::Configuration)
#fetch_benchmarking_data
def fetch_benchmarking_data(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse
def fetch_benchmarking_data(model_server_info: nil, instance_type: nil, pricing_model: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse
Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.
def fetch_benchmarking_data(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse
fetch_benchmarking_data
via a request object, either of type
FetchBenchmarkingDataRequest or an equivalent Hash.
- request (::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def fetch_benchmarking_data(model_server_info: nil, instance_type: nil, pricing_model: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse
fetch_benchmarking_data
via keyword arguments. Note that at
least one keyword argument is required. To specify no parameters, or to keep all
the default parameter values, pass an empty Hash as a request object (see above).
- model_server_info (::Google::Cloud::GkeRecommender::V1::ModelServerInfo, ::Hash) — Required. The model server configuration to get benchmarking data for. Use GkeInferenceQuickstart.FetchProfiles to find valid configurations.
-
instance_type (::String) — Optional. The instance type to filter benchmarking data. Instance types are
in the format
a2-highgpu-1g
. If not provided, all instance types for the given profile'smodel_server_info
will be returned. Use GkeInferenceQuickstart.FetchProfiles to find available instance types. -
pricing_model (::String) — Optional. The pricing model to use for the benchmarking data. Defaults to
spot
.
- (response, operation) — Access the result along with the RPC operation
- response (::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse)
- operation (::GRPC::ActiveCall::Operation)
- (::Google::Cloud::Error) — if the RPC is aborted.
Basic example
require "google/cloud/gke_recommender/v1" # Create a client object. The client can be reused for multiple calls. client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new # Create a request. To set request fields, pass in keyword arguments. request = Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataRequest.new # Call the fetch_benchmarking_data method. result = client.fetch_benchmarking_data request # The returned object is of type Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse. p result
#fetch_model_server_versions
def fetch_model_server_versions(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse
def fetch_model_server_versions(model: nil, model_server: nil, page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse
Fetches available model server versions. Open-source servers use their own
versioning schemas (e.g., vllm
uses semver like v1.0.0
).
Some model servers have different versioning schemas depending on the
accelerator. For example, vllm
uses semver on GPUs, but returns nightly
build tags on TPUs. All available versions will be returned when different
schemas are present.
def fetch_model_server_versions(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse
fetch_model_server_versions
via a request object, either of type
FetchModelServerVersionsRequest or an equivalent Hash.
- request (::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def fetch_model_server_versions(model: nil, model_server: nil, page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse
fetch_model_server_versions
via keyword arguments. Note that at
least one keyword argument is required. To specify no parameters, or to keep all
the default parameter values, pass an empty Hash as a request object (see above).
-
model (::String) — Required. The model for which to list model server versions. Open-source
models follow the Huggingface Hub
owner/model_name
format. Use GkeInferenceQuickstart.FetchModels to find available models. -
model_server (::String) — Required. The model server for which to list versions. Open-source model
servers use simplified, lowercase names (e.g.,
vllm
). Use GkeInferenceQuickstart.FetchModelServers to find available model servers. - page_size (::Integer) — Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
-
page_token (::String) — Optional. The value of
next_page_token
received from a previous
FetchModelServerVersionsRequest
call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided toFetchModelServerVersionsRequest
must match the call that provided the page token.
- (response, operation) — Access the result along with the RPC operation
- response (::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse)
- operation (::GRPC::ActiveCall::Operation)
- (::Google::Cloud::Error) — if the RPC is aborted.
Basic example
require "google/cloud/gke_recommender/v1" # Create a client object. The client can be reused for multiple calls. client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new # Create a request. To set request fields, pass in keyword arguments. request = Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsRequest.new # Call the fetch_model_server_versions method. result = client.fetch_model_server_versions request # The returned object is of type Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse. p result
#fetch_model_servers
def fetch_model_servers(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse
def fetch_model_servers(model: nil, page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse
Fetches available model servers. Open-source model servers use simplified,
lowercase names (e.g., vllm
).
def fetch_model_servers(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse
fetch_model_servers
via a request object, either of type
FetchModelServersRequest or an equivalent Hash.
- request (::Google::Cloud::GkeRecommender::V1::FetchModelServersRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def fetch_model_servers(model: nil, page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse
fetch_model_servers
via keyword arguments. Note that at
least one keyword argument is required. To specify no parameters, or to keep all
the default parameter values, pass an empty Hash as a request object (see above).
-
model (::String) — Required. The model for which to list model servers. Open-source models
follow the Huggingface Hub
owner/model_name
format. Use GkeInferenceQuickstart.FetchModels to find available models. - page_size (::Integer) — Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
-
page_token (::String) — Optional. The value of
next_page_token
received from a previous
FetchModelServersRequest
call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided toFetchModelServersRequest
must match the call that provided the page token.
- (response, operation) — Access the result along with the RPC operation
- response (::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse)
- operation (::GRPC::ActiveCall::Operation)
- (::Google::Cloud::Error) — if the RPC is aborted.
Basic example
require "google/cloud/gke_recommender/v1" # Create a client object. The client can be reused for multiple calls. client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new # Create a request. To set request fields, pass in keyword arguments. request = Google::Cloud::GkeRecommender::V1::FetchModelServersRequest.new # Call the fetch_model_servers method. result = client.fetch_model_servers request # The returned object is of type Google::Cloud::GkeRecommender::V1::FetchModelServersResponse. p result
#fetch_models
def fetch_models(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse
def fetch_models(page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse
Fetches available models. Open-source models follow the Huggingface Hub
owner/model_name
format.
def fetch_models(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse
fetch_models
via a request object, either of type
FetchModelsRequest or an equivalent Hash.
- request (::Google::Cloud::GkeRecommender::V1::FetchModelsRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def fetch_models(page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse
fetch_models
via keyword arguments. Note that at
least one keyword argument is required. To specify no parameters, or to keep all
the default parameter values, pass an empty Hash as a request object (see above).
- page_size (::Integer) — Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
-
page_token (::String) — Optional. The value of
next_page_token
received from a previous
FetchModelsRequest
call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided toFetchModelsRequest
must match the call that provided the page token.
- (response, operation) — Access the result along with the RPC operation
- response (::Google::Cloud::GkeRecommender::V1::FetchModelsResponse)
- operation (::GRPC::ActiveCall::Operation)
- (::Google::Cloud::Error) — if the RPC is aborted.
Basic example
require "google/cloud/gke_recommender/v1" # Create a client object. The client can be reused for multiple calls. client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new # Create a request. To set request fields, pass in keyword arguments. request = Google::Cloud::GkeRecommender::V1::FetchModelsRequest.new # Call the fetch_models method. result = client.fetch_models request # The returned object is of type Google::Cloud::GkeRecommender::V1::FetchModelsResponse. p result
#fetch_profiles
def fetch_profiles(request, options = nil) -> ::Gapic::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>
def fetch_profiles(model: nil, model_server: nil, model_server_version: nil, performance_requirements: nil, page_size: nil, page_token: nil) -> ::Gapic::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>
Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.
Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.
def fetch_profiles(request, options = nil) -> ::Gapic::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>
fetch_profiles
via a request object, either of type
FetchProfilesRequest or an equivalent Hash.
- request (::Google::Cloud::GkeRecommender::V1::FetchProfilesRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def fetch_profiles(model: nil, model_server: nil, model_server_version: nil, performance_requirements: nil, page_size: nil, page_token: nil) -> ::Gapic::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>
fetch_profiles
via keyword arguments. Note that at
least one keyword argument is required. To specify no parameters, or to keep all
the default parameter values, pass an empty Hash as a request object (see above).
-
model (::String) — Optional. The model to filter profiles by. Open-source models follow the
Huggingface Hub
owner/model_name
format. If not provided, all models are returned. Use GkeInferenceQuickstart.FetchModels to find available models. - model_server (::String) — Optional. The model server to filter profiles by. If not provided, all model servers are returned. Use GkeInferenceQuickstart.FetchModelServers to find available model servers for a given model.
- model_server_version (::String) — Optional. The model server version to filter profiles by. If not provided, all model server versions are returned. Use GkeInferenceQuickstart.FetchModelServerVersions to find available versions for a given model and server.
- performance_requirements (::Google::Cloud::GkeRecommender::V1::PerformanceRequirements, ::Hash) — Optional. The performance requirements to filter profiles. Profiles that do not meet these requirements are filtered out. If not provided, all profiles are returned.
- page_size (::Integer) — Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
-
page_token (::String) — Optional. The value of
next_page_token
received from a previous
FetchProfilesRequest
call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided toFetchProfilesRequest
must match the call that provided the page token.
- (response, operation) — Access the result along with the RPC operation
- response (::Gapic::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>)
- operation (::GRPC::ActiveCall::Operation)
- (::Gapic::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>)
- (::Google::Cloud::Error) — if the RPC is aborted.
Basic example
require "google/cloud/gke_recommender/v1" # Create a client object. The client can be reused for multiple calls. client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new # Create a request. To set request fields, pass in keyword arguments. request = Google::Cloud::GkeRecommender::V1::FetchProfilesRequest.new # Call the fetch_profiles method. result = client.fetch_profiles request # The returned object is of type Gapic::PagedEnumerable. You can iterate # over elements, and API calls will be issued to fetch pages as needed. result.each do |item| # Each element is of type ::Google::Cloud::GkeRecommender::V1::Profile. p item end
#generate_optimized_manifest
def generate_optimized_manifest(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse
def generate_optimized_manifest(model_server_info: nil, accelerator_type: nil, kubernetes_namespace: nil, performance_requirements: nil, storage_config: nil) -> ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse
Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.
def generate_optimized_manifest(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse
generate_optimized_manifest
via a request object, either of type
Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestRequest or an equivalent Hash.
- request (::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def generate_optimized_manifest(model_server_info: nil, accelerator_type: nil, kubernetes_namespace: nil, performance_requirements: nil, storage_config: nil) -> ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse
generate_optimized_manifest
via keyword arguments. Note that at
least one keyword argument is required. To specify no parameters, or to keep all
the default parameter values, pass an empty Hash as a request object (see above).
- model_server_info (::Google::Cloud::GkeRecommender::V1::ModelServerInfo, ::Hash) — Required. The model server configuration to generate the manifest for. Use GkeInferenceQuickstart.FetchProfiles to find valid configurations.
-
accelerator_type (::String) — Required. The accelerator type. Use
GkeInferenceQuickstart.FetchProfiles
to find valid accelerators for a given
model_server_info
. - kubernetes_namespace (::String) — Optional. The kubernetes namespace to deploy the manifests in.
- performance_requirements (::Google::Cloud::GkeRecommender::V1::PerformanceRequirements, ::Hash) — Optional. The performance requirements to use for generating Horizontal Pod Autoscaler (HPA) resources. If provided, the manifest includes HPA resources to adjust the model server replica count to maintain the specified targets (e.g., NTPOT, TTFT) at a P50 latency. Cost targets are not currently supported for HPA generation. If the specified targets are not achievable, the HPA manifest will not be generated.
- storage_config (::Google::Cloud::GkeRecommender::V1::StorageConfig, ::Hash) — Optional. The storage configuration for the model. If not provided, the model is loaded from Huggingface.
- (response, operation) — Access the result along with the RPC operation
- response (::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse)
- operation (::GRPC::ActiveCall::Operation)
- (::Google::Cloud::Error) — if the RPC is aborted.
Basic example
require "google/cloud/gke_recommender/v1" # Create a client object. The client can be reused for multiple calls. client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new # Create a request. To set request fields, pass in keyword arguments. request = Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestRequest.new # Call the generate_optimized_manifest method. result = client.generate_optimized_manifest request # The returned object is of type Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse. p result
#initialize
def initialize() { |config| ... } -> Client
Create a new GkeInferenceQuickstart client object.
- (config) — Configure the GkeInferenceQuickstart client.
- config (Client::Configuration)
- (Client) — a new instance of Client
# Create a client using the default configuration client = ::Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new # Create a client using a custom configuration client = ::Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new do |config| config.timeout = 10.0 end
#logger
def logger() -> Logger
The logger used for request/response debug logging.
- (Logger)
#universe_domain
def universe_domain() -> String
The effective universe domain
- (String)