GKE Recommender V1 API - Class Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client (v0.1.0)

Reference documentation and code samples for the GKE Recommender V1 API class Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.

Client for the GkeInferenceQuickstart service.

GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators. These profiles help generate optimized best practices for running inference on GKE.

Inherits

  • Object

Methods

.configure

def self.configure() { |config| ... } -> Client::Configuration

Configure the GkeInferenceQuickstart Client class.

See Configuration for a description of the configuration fields.

Yields
  • (config) — Configure the Client client.
Yield Parameter
Example
# Modify the configuration for all GkeInferenceQuickstart clients
::Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.configure do |config|
  config.timeout = 10.0
end

#configure

def configure() { |config| ... } -> Client::Configuration

Configure the GkeInferenceQuickstart Client instance.

The configuration is set to the derived mode, meaning that values can be changed, but structural changes (adding new fields, etc.) are not allowed. Structural changes should be made on Client.configure.

See Configuration for a description of the configuration fields.

Yields
  • (config) — Configure the Client client.
Yield Parameter

#fetch_benchmarking_data

def fetch_benchmarking_data(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse
def fetch_benchmarking_data(model_server_info: nil, instance_type: nil, pricing_model: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Overloads
def fetch_benchmarking_data(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse
Pass arguments to fetch_benchmarking_data via a request object, either of type FetchBenchmarkingDataRequest or an equivalent Hash.
Parameters
  • request (::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
  • options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def fetch_benchmarking_data(model_server_info: nil, instance_type: nil, pricing_model: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse
Pass arguments to fetch_benchmarking_data via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters
  • model_server_info (::Google::Cloud::GkeRecommender::V1::ModelServerInfo, ::Hash) — Required. The model server configuration to get benchmarking data for. Use GkeInferenceQuickstart.FetchProfiles to find valid configurations.
  • instance_type (::String) — Optional. The instance type to filter benchmarking data. Instance types are in the format a2-highgpu-1g. If not provided, all instance types for the given profile's model_server_info will be returned. Use GkeInferenceQuickstart.FetchProfiles to find available instance types.
  • pricing_model (::String) — Optional. The pricing model to use for the benchmarking data. Defaults to spot.
Yields
  • (response, operation) — Access the result along with the RPC operation
Yield Parameters
Raises
  • (::Google::Cloud::Error) — if the RPC is aborted.
Example

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataRequest.new

# Call the fetch_benchmarking_data method.
result = client.fetch_benchmarking_data request

# The returned object is of type Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse.
p result

#fetch_model_server_versions

def fetch_model_server_versions(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse
def fetch_model_server_versions(model: nil, model_server: nil, page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

Overloads
def fetch_model_server_versions(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse
Pass arguments to fetch_model_server_versions via a request object, either of type FetchModelServerVersionsRequest or an equivalent Hash.
Parameters
  • request (::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
  • options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def fetch_model_server_versions(model: nil, model_server: nil, page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse
Pass arguments to fetch_model_server_versions via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters
  • model (::String) — Required. The model for which to list model server versions. Open-source models follow the Huggingface Hub owner/model_name format. Use GkeInferenceQuickstart.FetchModels to find available models.
  • model_server (::String) — Required. The model server for which to list versions. Open-source model servers use simplified, lowercase names (e.g., vllm). Use GkeInferenceQuickstart.FetchModelServers to find available model servers.
  • page_size (::Integer) — Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
  • page_token (::String) — Optional. The value of next_page_token received from a previous FetchModelServerVersionsRequest call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided to FetchModelServerVersionsRequest must match the call that provided the page token.
Yields
  • (response, operation) — Access the result along with the RPC operation
Yield Parameters
Raises
  • (::Google::Cloud::Error) — if the RPC is aborted.
Example

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsRequest.new

# Call the fetch_model_server_versions method.
result = client.fetch_model_server_versions request

# The returned object is of type Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse.
p result

#fetch_model_servers

def fetch_model_servers(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse
def fetch_model_servers(model: nil, page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

Overloads
def fetch_model_servers(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse
Pass arguments to fetch_model_servers via a request object, either of type FetchModelServersRequest or an equivalent Hash.
Parameters
  • request (::Google::Cloud::GkeRecommender::V1::FetchModelServersRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
  • options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def fetch_model_servers(model: nil, page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse
Pass arguments to fetch_model_servers via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters
  • model (::String) — Required. The model for which to list model servers. Open-source models follow the Huggingface Hub owner/model_name format. Use GkeInferenceQuickstart.FetchModels to find available models.
  • page_size (::Integer) — Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
  • page_token (::String) — Optional. The value of next_page_token received from a previous FetchModelServersRequest call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided to FetchModelServersRequest must match the call that provided the page token.
Yields
  • (response, operation) — Access the result along with the RPC operation
Yield Parameters
Raises
  • (::Google::Cloud::Error) — if the RPC is aborted.
Example

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::FetchModelServersRequest.new

# Call the fetch_model_servers method.
result = client.fetch_model_servers request

# The returned object is of type Google::Cloud::GkeRecommender::V1::FetchModelServersResponse.
p result

#fetch_models

def fetch_models(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse
def fetch_models(page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

Overloads
def fetch_models(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse
Pass arguments to fetch_models via a request object, either of type FetchModelsRequest or an equivalent Hash.
Parameters
  • request (::Google::Cloud::GkeRecommender::V1::FetchModelsRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
  • options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def fetch_models(page_size: nil, page_token: nil) -> ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse
Pass arguments to fetch_models via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters
  • page_size (::Integer) — Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
  • page_token (::String) — Optional. The value of next_page_token received from a previous FetchModelsRequest call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided to FetchModelsRequest must match the call that provided the page token.
Yields
  • (response, operation) — Access the result along with the RPC operation
Yield Parameters
Raises
  • (::Google::Cloud::Error) — if the RPC is aborted.
Example

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::FetchModelsRequest.new

# Call the fetch_models method.
result = client.fetch_models request

# The returned object is of type Google::Cloud::GkeRecommender::V1::FetchModelsResponse.
p result

#fetch_profiles

def fetch_profiles(request, options = nil) -> ::Gapic::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>
def fetch_profiles(model: nil, model_server: nil, model_server_version: nil, performance_requirements: nil, page_size: nil, page_token: nil) -> ::Gapic::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>

Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.

Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

Overloads
def fetch_profiles(request, options = nil) -> ::Gapic::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>
Pass arguments to fetch_profiles via a request object, either of type FetchProfilesRequest or an equivalent Hash.
Parameters
  • request (::Google::Cloud::GkeRecommender::V1::FetchProfilesRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
  • options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def fetch_profiles(model: nil, model_server: nil, model_server_version: nil, performance_requirements: nil, page_size: nil, page_token: nil) -> ::Gapic::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>
Pass arguments to fetch_profiles via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters
  • model (::String) — Optional. The model to filter profiles by. Open-source models follow the Huggingface Hub owner/model_name format. If not provided, all models are returned. Use GkeInferenceQuickstart.FetchModels to find available models.
  • model_server (::String) — Optional. The model server to filter profiles by. If not provided, all model servers are returned. Use GkeInferenceQuickstart.FetchModelServers to find available model servers for a given model.
  • model_server_version (::String) — Optional. The model server version to filter profiles by. If not provided, all model server versions are returned. Use GkeInferenceQuickstart.FetchModelServerVersions to find available versions for a given model and server.
  • performance_requirements (::Google::Cloud::GkeRecommender::V1::PerformanceRequirements, ::Hash) — Optional. The performance requirements to filter profiles. Profiles that do not meet these requirements are filtered out. If not provided, all profiles are returned.
  • page_size (::Integer) — Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
  • page_token (::String) — Optional. The value of next_page_token received from a previous FetchProfilesRequest call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided to FetchProfilesRequest must match the call that provided the page token.
Yields
  • (response, operation) — Access the result along with the RPC operation
Yield Parameters
Returns
Raises
  • (::Google::Cloud::Error) — if the RPC is aborted.
Example

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::FetchProfilesRequest.new

# Call the fetch_profiles method.
result = client.fetch_profiles request

# The returned object is of type Gapic::PagedEnumerable. You can iterate
# over elements, and API calls will be issued to fetch pages as needed.
result.each do |item|
  # Each element is of type ::Google::Cloud::GkeRecommender::V1::Profile.
  p item
end

#generate_optimized_manifest

def generate_optimized_manifest(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse
def generate_optimized_manifest(model_server_info: nil, accelerator_type: nil, kubernetes_namespace: nil, performance_requirements: nil, storage_config: nil) -> ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.

Overloads
def generate_optimized_manifest(request, options = nil) -> ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse
Pass arguments to generate_optimized_manifest via a request object, either of type Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestRequest or an equivalent Hash.
Parameters
  • request (::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestRequest, ::Hash) — A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
  • options (::Gapic::CallOptions, ::Hash) — Overrides the default settings for this call, e.g, timeout, retries, etc. Optional.
def generate_optimized_manifest(model_server_info: nil, accelerator_type: nil, kubernetes_namespace: nil, performance_requirements: nil, storage_config: nil) -> ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse
Pass arguments to generate_optimized_manifest via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters
  • model_server_info (::Google::Cloud::GkeRecommender::V1::ModelServerInfo, ::Hash) — Required. The model server configuration to generate the manifest for. Use GkeInferenceQuickstart.FetchProfiles to find valid configurations.
  • accelerator_type (::String) — Required. The accelerator type. Use GkeInferenceQuickstart.FetchProfiles to find valid accelerators for a given model_server_info.
  • kubernetes_namespace (::String) — Optional. The kubernetes namespace to deploy the manifests in.
  • performance_requirements (::Google::Cloud::GkeRecommender::V1::PerformanceRequirements, ::Hash) — Optional. The performance requirements to use for generating Horizontal Pod Autoscaler (HPA) resources. If provided, the manifest includes HPA resources to adjust the model server replica count to maintain the specified targets (e.g., NTPOT, TTFT) at a P50 latency. Cost targets are not currently supported for HPA generation. If the specified targets are not achievable, the HPA manifest will not be generated.
  • storage_config (::Google::Cloud::GkeRecommender::V1::StorageConfig, ::Hash) — Optional. The storage configuration for the model. If not provided, the model is loaded from Huggingface.
Yields
  • (response, operation) — Access the result along with the RPC operation
Yield Parameters
Raises
  • (::Google::Cloud::Error) — if the RPC is aborted.
Example

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestRequest.new

# Call the generate_optimized_manifest method.
result = client.generate_optimized_manifest request

# The returned object is of type Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse.
p result

#initialize

def initialize() { |config| ... } -> Client

Create a new GkeInferenceQuickstart client object.

Yields
  • (config) — Configure the GkeInferenceQuickstart client.
Yield Parameter
Returns
  • (Client) — a new instance of Client
Example
# Create a client using the default configuration
client = ::Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new

# Create a client using a custom configuration
client = ::Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Client.new do |config|
  config.timeout = 10.0
end

#logger

def logger() -> Logger

The logger used for request/response debug logging.

Returns
  • (Logger)

#universe_domain

def universe_domain() -> String

The effective universe domain

Returns
  • (String)