Google Cloud Gke Recommender V1 Client - Class GkeInferenceQuickstartClient (0.1.0)

Reference documentation and code samples for the Google Cloud Gke Recommender V1 Client class GkeInferenceQuickstartClient.

Service Description: GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators.

These profiles help generate optimized best practices for running inference on GKE.

This class provides the ability to make remote calls to the backing service through method calls that map to API methods.

Namespace

Google \ Cloud \ GkeRecommender \ V1 \ Client

Methods

__construct

Constructor.

Parameters
Name Description
options array|Google\ApiCore\Options\ClientOptions

Optional. Options for configuring the service API wrapper.

↳ apiEndpoint string

The address of the API remote host. May optionally include the port, formatted as "

↳ credentials FetchAuthTokenInterface|CredentialsWrapper

This option should only be used with a pre-constructed Google\Auth\FetchAuthTokenInterface or Google\ApiCore\CredentialsWrapper object. Note that when one of these objects are provided, any settings in $credentialsConfig will be ignored. Important: If you are providing a path to a credentials file, or a decoded credentials file as a PHP array, this usage is now DEPRECATED. Providing an unvalidated credential configuration to Google APIs can compromise the security of your systems and data. It is recommended to create the credentials explicitly use Google\Auth\Credentials\ServiceAccountCredentials; use Google\Cloud\GkeRecommender\V1\GkeInferenceQuickstartClient; $creds = new ServiceAccountCredentials($scopes, $json); $options = new GkeInferenceQuickstartClient(['credentials' => $creds]); https://cloud.google.com/docs/authentication/external/externally-sourced-credentials

↳ credentialsConfig array

Options used to configure credentials, including auth token caching, for the client. For a full list of supporting configuration options, see Google\ApiCore\CredentialsWrapper::build() .

↳ disableRetries bool

Determines whether or not retries defined by the client configuration should be disabled. Defaults to false.

↳ clientConfig string|array

Client method configuration, including retry settings. This option can be either a path to a JSON file, or a PHP array containing the decoded JSON data. By default this settings points to the default client config file, which is provided in the resources folder.

↳ transport string|TransportInterface

The transport used for executing network requests. May be either the string rest or grpc. Defaults to grpc if gRPC support is detected on the system. Advanced usage: Additionally, it is possible to pass in an already instantiated Google\ApiCore\Transport\TransportInterface object. Note that when this object is provided, any settings in $transportConfig, and any $apiEndpoint setting, will be ignored.

↳ transportConfig array

Configuration options that will be used to construct the transport. Options for each supported transport type should be passed in a key for that transport. For example: $transportConfig = [ 'grpc' => [...], 'rest' => [...], ]; See the Google\ApiCore\Transport\GrpcTransport::build() and Google\ApiCore\Transport\RestTransport::build() methods for the supported options.

↳ clientCertSource callable

A callable which returns the client cert as a string. This can be used to provide a certificate and private key to the transport layer for mTLS.

↳ logger false|LoggerInterface

A PSR-3 compliant logger. If set to false, logging is disabled, ignoring the 'GOOGLE_SDK_PHP_LOGGING' environment flag

↳ universeDomain string

The service domain for the client. Defaults to 'googleapis.com'.

fetchBenchmarkingData

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

The async variant is GkeInferenceQuickstartClient::fetchBenchmarkingDataAsync() .

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataRequest

A request to house fields associated with the call.

callOptions array

Optional.

↳ retrySettings RetrySettings|array

Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage.

Returns
Type Description
Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataResponse
Example
use Google\ApiCore\ApiException;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataRequest;
use Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataResponse;
use Google\Cloud\GkeRecommender\V1\ModelServerInfo;

/**
 * @param string $modelServerInfoModel       The model. Open-source models follow the Huggingface Hub
 *                                           `owner/model_name` format. Use
 *                                           [GkeInferenceQuickstart.FetchModels][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModels]
 *                                           to find available models.
 * @param string $modelServerInfoModelServer The model server. Open-source model servers use simplified,
 *                                           lowercase names (e.g., `vllm`). Use
 *                                           [GkeInferenceQuickstart.FetchModelServers][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModelServers]
 *                                           to find available servers.
 */
function fetch_benchmarking_data_sample(
    string $modelServerInfoModel,
    string $modelServerInfoModelServer
): void {
    // Create a client.
    $gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();

    // Prepare the request message.
    $modelServerInfo = (new ModelServerInfo())
        ->setModel($modelServerInfoModel)
        ->setModelServer($modelServerInfoModelServer);
    $request = (new FetchBenchmarkingDataRequest())
        ->setModelServerInfo($modelServerInfo);

    // Call the API and handle any network failures.
    try {
        /** @var FetchBenchmarkingDataResponse $response */
        $response = $gkeInferenceQuickstartClient->fetchBenchmarkingData($request);
        printf('Response data: %s' . PHP_EOL, $response->serializeToJsonString());
    } catch (ApiException $ex) {
        printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
    }
}

/**
 * Helper to execute the sample.
 *
 * This sample has been automatically generated and should be regarded as a code
 * template only. It will require modifications to work:
 *  - It may require correct/in-range values for request initialization.
 *  - It may require specifying regional endpoints when creating the service client,
 *    please see the apiEndpoint client configuration option for more details.
 */
function callSample(): void
{
    $modelServerInfoModel = '[MODEL]';
    $modelServerInfoModelServer = '[MODEL_SERVER]';

    fetch_benchmarking_data_sample($modelServerInfoModel, $modelServerInfoModelServer);
}

fetchModelServerVersions

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

The async variant is GkeInferenceQuickstartClient::fetchModelServerVersionsAsync() .

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\FetchModelServerVersionsRequest

A request to house fields associated with the call.

callOptions array

Optional.

↳ retrySettings RetrySettings|array

Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage.

Returns
Type Description
Google\ApiCore\PagedListResponse
Example
use Google\ApiCore\ApiException;
use Google\ApiCore\PagedListResponse;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\FetchModelServerVersionsRequest;

/**
 * @param string $model       The model for which to list model server versions. Open-source
 *                            models follow the Huggingface Hub `owner/model_name` format. Use
 *                            [GkeInferenceQuickstart.FetchModels][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModels]
 *                            to find available models.
 * @param string $modelServer The model server for which to list versions. Open-source model
 *                            servers use simplified, lowercase names (e.g., `vllm`). Use
 *                            [GkeInferenceQuickstart.FetchModelServers][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModelServers]
 *                            to find available model servers.
 */
function fetch_model_server_versions_sample(string $model, string $modelServer): void
{
    // Create a client.
    $gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();

    // Prepare the request message.
    $request = (new FetchModelServerVersionsRequest())
        ->setModel($model)
        ->setModelServer($modelServer);

    // Call the API and handle any network failures.
    try {
        /** @var PagedListResponse $response */
        $response = $gkeInferenceQuickstartClient->fetchModelServerVersions($request);

        /** @var string $element */
        foreach ($response as $element) {
            printf('Element data: %s' . PHP_EOL, $element);
        }
    } catch (ApiException $ex) {
        printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
    }
}

/**
 * Helper to execute the sample.
 *
 * This sample has been automatically generated and should be regarded as a code
 * template only. It will require modifications to work:
 *  - It may require correct/in-range values for request initialization.
 *  - It may require specifying regional endpoints when creating the service client,
 *    please see the apiEndpoint client configuration option for more details.
 */
function callSample(): void
{
    $model = '[MODEL]';
    $modelServer = '[MODEL_SERVER]';

    fetch_model_server_versions_sample($model, $modelServer);
}

fetchModelServers

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

The async variant is GkeInferenceQuickstartClient::fetchModelServersAsync() .

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\FetchModelServersRequest

A request to house fields associated with the call.

callOptions array

Optional.

↳ retrySettings RetrySettings|array

Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage.

Returns
Type Description
Google\ApiCore\PagedListResponse
Example
use Google\ApiCore\ApiException;
use Google\ApiCore\PagedListResponse;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\FetchModelServersRequest;

/**
 * @param string $model The model for which to list model servers. Open-source models
 *                      follow the Huggingface Hub `owner/model_name` format. Use
 *                      [GkeInferenceQuickstart.FetchModels][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModels]
 *                      to find available models.
 */
function fetch_model_servers_sample(string $model): void
{
    // Create a client.
    $gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();

    // Prepare the request message.
    $request = (new FetchModelServersRequest())
        ->setModel($model);

    // Call the API and handle any network failures.
    try {
        /** @var PagedListResponse $response */
        $response = $gkeInferenceQuickstartClient->fetchModelServers($request);

        /** @var string $element */
        foreach ($response as $element) {
            printf('Element data: %s' . PHP_EOL, $element);
        }
    } catch (ApiException $ex) {
        printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
    }
}

/**
 * Helper to execute the sample.
 *
 * This sample has been automatically generated and should be regarded as a code
 * template only. It will require modifications to work:
 *  - It may require correct/in-range values for request initialization.
 *  - It may require specifying regional endpoints when creating the service client,
 *    please see the apiEndpoint client configuration option for more details.
 */
function callSample(): void
{
    $model = '[MODEL]';

    fetch_model_servers_sample($model);
}

fetchModels

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

The async variant is GkeInferenceQuickstartClient::fetchModelsAsync() .

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\FetchModelsRequest

A request to house fields associated with the call.

callOptions array

Optional.

↳ retrySettings RetrySettings|array

Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage.

Returns
Type Description
Google\ApiCore\PagedListResponse
Example
use Google\ApiCore\ApiException;
use Google\ApiCore\PagedListResponse;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\FetchModelsRequest;

/**
 * This sample has been automatically generated and should be regarded as a code
 * template only. It will require modifications to work:
 *  - It may require correct/in-range values for request initialization.
 *  - It may require specifying regional endpoints when creating the service client,
 *    please see the apiEndpoint client configuration option for more details.
 */
function fetch_models_sample(): void
{
    // Create a client.
    $gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();

    // Prepare the request message.
    $request = new FetchModelsRequest();

    // Call the API and handle any network failures.
    try {
        /** @var PagedListResponse $response */
        $response = $gkeInferenceQuickstartClient->fetchModels($request);

        /** @var string $element */
        foreach ($response as $element) {
            printf('Element data: %s' . PHP_EOL, $element);
        }
    } catch (ApiException $ex) {
        printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
    }
}

fetchProfiles

Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.

Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

The async variant is GkeInferenceQuickstartClient::fetchProfilesAsync() .

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\FetchProfilesRequest

A request to house fields associated with the call.

callOptions array

Optional.

↳ retrySettings RetrySettings|array

Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage.

Returns
Type Description
Google\ApiCore\PagedListResponse
Example
use Google\ApiCore\ApiException;
use Google\ApiCore\PagedListResponse;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\FetchProfilesRequest;
use Google\Cloud\GkeRecommender\V1\Profile;

/**
 * This sample has been automatically generated and should be regarded as a code
 * template only. It will require modifications to work:
 *  - It may require correct/in-range values for request initialization.
 *  - It may require specifying regional endpoints when creating the service client,
 *    please see the apiEndpoint client configuration option for more details.
 */
function fetch_profiles_sample(): void
{
    // Create a client.
    $gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();

    // Prepare the request message.
    $request = new FetchProfilesRequest();

    // Call the API and handle any network failures.
    try {
        /** @var PagedListResponse $response */
        $response = $gkeInferenceQuickstartClient->fetchProfiles($request);

        /** @var Profile $element */
        foreach ($response as $element) {
            printf('Element data: %s' . PHP_EOL, $element->serializeToJsonString());
        }
    } catch (ApiException $ex) {
        printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
    }
}

generateOptimizedManifest

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.

The async variant is GkeInferenceQuickstartClient::generateOptimizedManifestAsync() .

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestRequest

A request to house fields associated with the call.

callOptions array

Optional.

↳ retrySettings RetrySettings|array

Retry settings to use for this call. Can be a Google\ApiCore\RetrySettings object, or an associative array of retry settings parameters. See the documentation on Google\ApiCore\RetrySettings for example usage.

Returns
Type Description
Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestResponse
Example
use Google\ApiCore\ApiException;
use Google\Cloud\GkeRecommender\V1\Client\GkeInferenceQuickstartClient;
use Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestRequest;
use Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestResponse;
use Google\Cloud\GkeRecommender\V1\ModelServerInfo;

/**
 * @param string $modelServerInfoModel       The model. Open-source models follow the Huggingface Hub
 *                                           `owner/model_name` format. Use
 *                                           [GkeInferenceQuickstart.FetchModels][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModels]
 *                                           to find available models.
 * @param string $modelServerInfoModelServer The model server. Open-source model servers use simplified,
 *                                           lowercase names (e.g., `vllm`). Use
 *                                           [GkeInferenceQuickstart.FetchModelServers][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchModelServers]
 *                                           to find available servers.
 * @param string $acceleratorType            The accelerator type. Use
 *                                           [GkeInferenceQuickstart.FetchProfiles][google.cloud.gkerecommender.v1.GkeInferenceQuickstart.FetchProfiles]
 *                                           to find valid accelerators for a given `model_server_info`.
 */
function generate_optimized_manifest_sample(
    string $modelServerInfoModel,
    string $modelServerInfoModelServer,
    string $acceleratorType
): void {
    // Create a client.
    $gkeInferenceQuickstartClient = new GkeInferenceQuickstartClient();

    // Prepare the request message.
    $modelServerInfo = (new ModelServerInfo())
        ->setModel($modelServerInfoModel)
        ->setModelServer($modelServerInfoModelServer);
    $request = (new GenerateOptimizedManifestRequest())
        ->setModelServerInfo($modelServerInfo)
        ->setAcceleratorType($acceleratorType);

    // Call the API and handle any network failures.
    try {
        /** @var GenerateOptimizedManifestResponse $response */
        $response = $gkeInferenceQuickstartClient->generateOptimizedManifest($request);
        printf('Response data: %s' . PHP_EOL, $response->serializeToJsonString());
    } catch (ApiException $ex) {
        printf('Call failed with message: %s' . PHP_EOL, $ex->getMessage());
    }
}

/**
 * Helper to execute the sample.
 *
 * This sample has been automatically generated and should be regarded as a code
 * template only. It will require modifications to work:
 *  - It may require correct/in-range values for request initialization.
 *  - It may require specifying regional endpoints when creating the service client,
 *    please see the apiEndpoint client configuration option for more details.
 */
function callSample(): void
{
    $modelServerInfoModel = '[MODEL]';
    $modelServerInfoModelServer = '[MODEL_SERVER]';
    $acceleratorType = '[ACCELERATOR_TYPE]';

    generate_optimized_manifest_sample(
        $modelServerInfoModel,
        $modelServerInfoModelServer,
        $acceleratorType
    );
}

fetchBenchmarkingDataAsync

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataRequest
optionalArgs array
Returns
Type Description
GuzzleHttp\Promise\PromiseInterface<Google\Cloud\GkeRecommender\V1\FetchBenchmarkingDataResponse>

fetchModelServerVersionsAsync

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\FetchModelServerVersionsRequest
optionalArgs array
Returns
Type Description
GuzzleHttp\Promise\PromiseInterface<Google\ApiCore\PagedListResponse>

fetchModelServersAsync

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\FetchModelServersRequest
optionalArgs array
Returns
Type Description
GuzzleHttp\Promise\PromiseInterface<Google\ApiCore\PagedListResponse>

fetchModelsAsync

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\FetchModelsRequest
optionalArgs array
Returns
Type Description
GuzzleHttp\Promise\PromiseInterface<Google\ApiCore\PagedListResponse>

fetchProfilesAsync

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\FetchProfilesRequest
optionalArgs array
Returns
Type Description
GuzzleHttp\Promise\PromiseInterface<Google\ApiCore\PagedListResponse>

generateOptimizedManifestAsync

Parameters
Name Description
request Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestRequest
optionalArgs array
Returns
Type Description
GuzzleHttp\Promise\PromiseInterface<Google\Cloud\GkeRecommender\V1\GenerateOptimizedManifestResponse>