Reference documentation and code samples for the Google Cloud Gke Recommender V1 Client class GkeInferenceQuickstartClient.
Service Description: GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators.
These profiles help generate optimized best practices for running inference on GKE.
This class provides the ability to make remote calls to the backing service through method calls that map to API methods.
Namespace
Google \ Cloud \ GkeRecommender \ V1 \ ClientMethods
__construct
Constructor.
| Parameters | |
|---|---|
| Name | Description |
options |
array|Google\ApiCore\Options\ClientOptions
Optional. Options for configuring the service API wrapper. |
↳ apiEndpoint |
string
The address of the API remote host. May optionally include the port, formatted as "
|
↳ credentials |
FetchAuthTokenInterface|CredentialsWrapper
This option should only be used with a pre-constructed Google\Auth\FetchAuthTokenInterface or Google\ApiCore\CredentialsWrapper object. Note that when one of these objects are provided, any settings in $credentialsConfig will be ignored. Important: If you are providing a path to a credentials file, or a decoded credentials file as a PHP array, this usage is now DEPRECATED. Providing an unvalidated credential configuration to Google APIs can compromise the security of your systems and data. It is recommended to create the credentials explicitly |
↳ credentialsConfig |
array
Options used to configure credentials, including auth token caching, for the client. For a full list of supporting configuration options, see Google\ApiCore\CredentialsWrapper::build() . |
↳ disableRetries |
bool
Determines whether or not retries defined by the client configuration should be disabled. Defaults to |
↳ clientConfig |
string|array
Client method configuration, including retry settings. This option can be either a path to a JSON file, or a PHP array containing the decoded JSON data. By default this settings points to the default client config file, which is provided in the resources folder. |
↳ transport |
string|TransportInterface
The transport used for executing network requests. May be either the string |
↳ transportConfig |
array
Configuration options that will be used to construct the transport. Options for each supported transport type should be passed in a key for that transport. For example: $transportConfig = [ 'grpc' => [...], 'rest' => [...], ]; See the Google\ApiCore\Transport\GrpcTransport::build() and Google\ApiCore\Transport\RestTransport::build() methods for the supported options. |
↳ clientCertSource |
callable
A callable which returns the client cert as a string. This can be used to provide a certificate and private key to the transport layer for mTLS. |
↳ logger |
false|LoggerInterface
A PSR-3 compliant logger. If set to false, logging is disabled, ignoring the 'GOOGLE_SDK_PHP_LOGGING' environment flag |
↳ universeDomain |
string
The service domain for the client. Defaults to 'googleapis.com'. |
fetchBenchmarkingData
Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.
The async variant is GkeInferenceQuickstartClient::fetchBenchmarkingDataAsync() .
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataRequest
A request to house fields associated with the call. |
callOptions |
array
Optional. |
↳ retrySettings |
RetrySettings|array
Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage. |
| Returns | |
|---|---|
| Type | Description |
Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataResponse |
|
use Google\ApiCore\ApiException;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataRequest;
use Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataResponse;
use Google\Cloud\GkeRecommender\V1\ModelServerInfo;
/**
* @param string $modelServerInfoModel The model. Open-source models follow the Huggingface Hub
* `owner/model_name` format. Use
* [GkeInferenceQuickstart.FetchModels][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModels]
* to find available models.
* @param string $modelServerInfoModelServer The model server. Open-source model servers use simplified,
* lowercase names (e.g., `vllm`). Use
* [GkeInferenceQuickstart.FetchModelServers][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModelServers]
* to find available servers.
*/
function fetch_benchmarking_data_sample(
string $modelServerInfoModel,
string $modelServerInfoModelServer
): void {
// Create a client.
$gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();
// Prepare the request message.
$modelServerInfo = (new ModelServerInfo())
->setModel($modelServerInfoModel)
->setModelServer($modelServerInfoModelServer);
$request = (new FetchBenchmarkingDataRequest())
->setModelServerInfo($modelServerInfo);
// Call the API and handle any network failures.
try {
/** @var FetchBenchmarkingDataResponse $response */
$response = $gkeInferenceQuickstartClient->fetchBenchmarkingData($request);
printf('Response data: %s' . PHP_EOL, $response->serializeToJsonString());
} catch (ApiException $ex) {
printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
}
}
/**
* Helper to execute the sample.
*
* This sample has been automatically generated and should be regarded as a code
* template only. It will require modifications to work:
* - It may require correct/in-range values for request initialization.
* - It may require specifying regional endpoints when creating the service client,
* please see the apiEndpoint client configuration option for more details.
*/
function callSample(): void
{
$modelServerInfoModel = '[MODEL]';
$modelServerInfoModelServer = '[MODEL_SERVER]';
fetch_benchmarking_data_sample($modelServerInfoModel, $modelServerInfoModelServer);
}
fetchModelServerVersions
Fetches available model server versions. Open-source servers use their own
versioning schemas (e.g., vllm uses semver like v1.0.0).
Some model servers have different versioning schemas depending on the
accelerator. For example, vllm uses semver on GPUs, but returns nightly
build tags on TPUs. All available versions will be returned when different
schemas are present.
The async variant is GkeInferenceQuickstartClient::fetchModelServerVersionsAsync() .
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\FetchModelServerVersionsRequest
A request to house fields associated with the call. |
callOptions |
array
Optional. |
↳ retrySettings |
RetrySettings|array
Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage. |
| Returns | |
|---|---|
| Type | Description |
Google\ApiCore\PagedListResponse |
|
use Google\ApiCore\ApiException;
use Google\ApiCore\PagedListResponse;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\FetchModelServerVersionsRequest;
/**
* @param string $model The model for which to list model server versions. Open-source
* models follow the Huggingface Hub `owner/model_name` format. Use
* [GkeInferenceQuickstart.FetchModels][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModels]
* to find available models.
* @param string $modelServer The model server for which to list versions. Open-source model
* servers use simplified, lowercase names (e.g., `vllm`). Use
* [GkeInferenceQuickstart.FetchModelServers][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModelServers]
* to find available model servers.
*/
function fetch_model_server_versions_sample(string $model, string $modelServer): void
{
// Create a client.
$gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();
// Prepare the request message.
$request = (new FetchModelServerVersionsRequest())
->setModel($model)
->setModelServer($modelServer);
// Call the API and handle any network failures.
try {
/** @var PagedListResponse $response */
$response = $gkeInferenceQuickstartClient->fetchModelServerVersions($request);
/** @var string $element */
foreach ($response as $element) {
printf('Element data: %s' . PHP_EOL, $element);
}
} catch (ApiException $ex) {
printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
}
}
/**
* Helper to execute the sample.
*
* This sample has been automatically generated and should be regarded as a code
* template only. It will require modifications to work:
* - It may require correct/in-range values for request initialization.
* - It may require specifying regional endpoints when creating the service client,
* please see the apiEndpoint client configuration option for more details.
*/
function callSample(): void
{
$model = '[MODEL]';
$modelServer = '[MODEL_SERVER]';
fetch_model_server_versions_sample($model, $modelServer);
}
fetchModelServers
Fetches available model servers. Open-source model servers use simplified,
lowercase names (e.g., vllm).
The async variant is GkeInferenceQuickstartClient::fetchModelServersAsync() .
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\FetchModelServersRequest
A request to house fields associated with the call. |
callOptions |
array
Optional. |
↳ retrySettings |
RetrySettings|array
Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage. |
| Returns | |
|---|---|
| Type | Description |
Google\ApiCore\PagedListResponse |
|
use Google\ApiCore\ApiException;
use Google\ApiCore\PagedListResponse;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\FetchModelServersRequest;
/**
* @param string $model The model for which to list model servers. Open-source models
* follow the Huggingface Hub `owner/model_name` format. Use
* [GkeInferenceQuickstart.FetchModels][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModels]
* to find available models.
*/
function fetch_model_servers_sample(string $model): void
{
// Create a client.
$gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();
// Prepare the request message.
$request = (new FetchModelServersRequest())
->setModel($model);
// Call the API and handle any network failures.
try {
/** @var PagedListResponse $response */
$response = $gkeInferenceQuickstartClient->fetchModelServers($request);
/** @var string $element */
foreach ($response as $element) {
printf('Element data: %s' . PHP_EOL, $element);
}
} catch (ApiException $ex) {
printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
}
}
/**
* Helper to execute the sample.
*
* This sample has been automatically generated and should be regarded as a code
* template only. It will require modifications to work:
* - It may require correct/in-range values for request initialization.
* - It may require specifying regional endpoints when creating the service client,
* please see the apiEndpoint client configuration option for more details.
*/
function callSample(): void
{
$model = '[MODEL]';
fetch_model_servers_sample($model);
}
fetchModels
Fetches available models. Open-source models follow the Huggingface Hub
owner/model_name format.
The async variant is GkeInferenceQuickstartClient::fetchModelsAsync() .
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\FetchModelsRequest
A request to house fields associated with the call. |
callOptions |
array
Optional. |
↳ retrySettings |
RetrySettings|array
Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage. |
| Returns | |
|---|---|
| Type | Description |
Google\ApiCore\PagedListResponse |
|
use Google\ApiCore\ApiException;
use Google\ApiCore\PagedListResponse;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\FetchModelsRequest;
/**
* This sample has been automatically generated and should be regarded as a code
* template only. It will require modifications to work:
* - It may require correct/in-range values for request initialization.
* - It may require specifying regional endpoints when creating the service client,
* please see the apiEndpoint client configuration option for more details.
*/
function fetch_models_sample(): void
{
// Create a client.
$gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();
// Prepare the request message.
$request = new FetchModelsRequest();
// Call the API and handle any network failures.
try {
/** @var PagedListResponse $response */
$response = $gkeInferenceQuickstartClient->fetchModels($request);
/** @var string $element */
foreach ($response as $element) {
printf('Element data: %s' . PHP_EOL, $element);
}
} catch (ApiException $ex) {
printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
}
}
fetchProfiles
Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.
Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.
The async variant is GkeInferenceQuickstartClient::fetchProfilesAsync() .
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\FetchProfilesRequest
A request to house fields associated with the call. |
callOptions |
array
Optional. |
↳ retrySettings |
RetrySettings|array
Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage. |
| Returns | |
|---|---|
| Type | Description |
Google\ApiCore\PagedListResponse |
|
use Google\ApiCore\ApiException;
use Google\ApiCore\PagedListResponse;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\FetchProfilesRequest;
use Google\Cloud\GkeRecommender\V1\Profile;
/**
* This sample has been automatically generated and should be regarded as a code
* template only. It will require modifications to work:
* - It may require correct/in-range values for request initialization.
* - It may require specifying regional endpoints when creating the service client,
* please see the apiEndpoint client configuration option for more details.
*/
function fetch_profiles_sample(): void
{
// Create a client.
$gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();
// Prepare the request message.
$request = new FetchProfilesRequest();
// Call the API and handle any network failures.
try {
/** @var PagedListResponse $response */
$response = $gkeInferenceQuickstartClient->fetchProfiles($request);
/** @var Profile $element */
foreach ($response as $element) {
printf('Element data: %s' . PHP_EOL, $element->serializeToJsonString());
}
} catch (ApiException $ex) {
printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
}
}
generateOptimizedManifest
Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.
The async variant is GkeInferenceQuickstartClient::generateOptimizedManifestAsync() .
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestRequest
A request to house fields associated with the call. |
callOptions |
array
Optional. |
↳ retrySettings |
RetrySettings|array
Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage. |
| Returns | |
|---|---|
| Type | Description |
Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestResponse |
|
use Google\ApiCore\ApiException;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestRequest;
use Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestResponse;
use Google\Cloud\GkeRecommender\V1\ModelServerInfo;
/**
* @param string $modelServerInfoModel The model. Open-source models follow the Huggingface Hub
* `owner/model_name` format. Use
* [GkeInferenceQuickstart.FetchModels][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModels]
* to find available models.
* @param string $modelServerInfoModelServer The model server. Open-source model servers use simplified,
* lowercase names (e.g., `vllm`). Use
* [GkeInferenceQuickstart.FetchModelServers][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModelServers]
* to find available servers.
* @param string $acceleratorType The accelerator type. Use
* [GkeInferenceQuickstart.FetchProfiles][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchProfiles]
* to find valid accelerators for a given `model_server_info`.
*/
function generate_optimized_manifest_sample(
string $modelServerInfoModel,
string $modelServerInfoModelServer,
string $acceleratorType
): void {
// Create a client.
$gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();
// Prepare the request message.
$modelServerInfo = (new ModelServerInfo())
->setModel($modelServerInfoModel)
->setModelServer($modelServerInfoModelServer);
$request = (new GenerateOptimizedManifestRequest())
->setModelServerInfo($modelServerInfo)
->setAcceleratorType($acceleratorType);
// Call the API and handle any network failures.
try {
/** @var GenerateOptimizedManifestResponse $response */
$response = $gkeInferenceQuickstartClient->generateOptimizedManifest($request);
printf('Response data: %s' . PHP_EOL, $response->serializeToJsonString());
} catch (ApiException $ex) {
printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
}
}
/**
* Helper to execute the sample.
*
* This sample has been automatically generated and should be regarded as a code
* template only. It will require modifications to work:
* - It may require correct/in-range values for request initialization.
* - It may require specifying regional endpoints when creating the service client,
* please see the apiEndpoint client configuration option for more details.
*/
function callSample(): void
{
$modelServerInfoModel = '[MODEL]';
$modelServerInfoModelServer = '[MODEL_SERVER]';
$acceleratorType = '[ACCELERATOR_TYPE]';
generate_optimized_manifest_sample(
$modelServerInfoModel,
$modelServerInfoModelServer,
$acceleratorType
);
}
fetchBenchmarkingDataAsync
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataRequest
|
optionalArgs |
array
|
| Returns | |
|---|---|
| Type | Description |
GuzzleHttp\Promise\PromiseInterface<Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataResponse> |
|
fetchModelServerVersionsAsync
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\FetchModelServerVersionsRequest
|
optionalArgs |
array
|
| Returns | |
|---|---|
| Type | Description |
GuzzleHttp\Promise\PromiseInterface<Google\ApiCore\PagedListResponse> |
|
fetchModelServersAsync
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\FetchModelServersRequest
|
optionalArgs |
array
|
| Returns | |
|---|---|
| Type | Description |
GuzzleHttp\Promise\PromiseInterface<Google\ApiCore\PagedListResponse> |
|
fetchModelsAsync
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\FetchModelsRequest
|
optionalArgs |
array
|
| Returns | |
|---|---|
| Type | Description |
GuzzleHttp\Promise\PromiseInterface<Google\ApiCore\PagedListResponse> |
|
fetchProfilesAsync
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\FetchProfilesRequest
|
optionalArgs |
array
|
| Returns | |
|---|---|
| Type | Description |
GuzzleHttp\Promise\PromiseInterface<Google\ApiCore\PagedListResponse> |
|
generateOptimizedManifestAsync
| Parameters | |
|---|---|
| Name | Description |
request |
Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestRequest
|
optionalArgs |
array
|
| Returns | |
|---|---|
| Type | Description |
GuzzleHttp\Promise\PromiseInterface<Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestResponse> |
|