Some or all of the information on this page might not apply to Cloud de Confiance by S3NS. See Differences from Google Cloud for more details.

gcloud container ai profiles

INFORMATION: gcloud container ai profiles is supported in universe domain universe; however, some of the values used in the help text may not be available. Command examples may not work as-is and may requires changes before execution.
NAME: gcloud container ai profiles - quickstart engine for GKE AI workloads
SYNOPSIS: gcloud container ai profiles GROUP | COMMAND [GCLOUD_WIDE_FLAG …]
DESCRIPTION: The GKE Inference Quickstart helps simplify deploying AI inference on Google Kubernetes Engine (GKE). It provides tailored profiles based on Google's internal benchmarks. Provide inputs like your preferred open-source model (e.g. Llama, Gemma, or Mistral) and your application's performance target. Based on these inputs, the quickstart generates accelerator choices with performance metrics, and detailed, ready-to-deploy profiles for compute, load balancing, and autoscaling. These profiles are provided as standard Kubernetes YAML manifests, which you can deploy or modify.
To visualize the benchmarking data that support these estimates, see the accompanying Colab notebook: https://colab.research.google.com/github/GoogleCloudPlatform/kubernetes-engine-samples/blob/main/ai-ml/notebooks/giq_visualizations.ipynb
GCLOUD WIDE FLAGS: These flags are available to all commands: --help.
Run $ gcloud help for details.
GROUPS: GROUP is one of the following:

benchmarks

Manage benchmarks for GKE Inference Quickstart.

manifests

Generate optimized Kubernetes manifests.

model-server-versions

Manage supported model server versions for GKE Inference Quickstart.

model-servers

Manage supported model servers for GKE Inference Quickstart.

models

Manage supported models for GKE Inference Quickstart.

serving-stack-versions

List supported serving stack versions for GKE Inference Quickstart.

serving-stacks

List supported serving stacks for GKE Inference Quickstart.

use-case

List supported use cases for GKE Inference Quickstart.
COMMANDS: COMMAND is one of the following:

list

List compatible accelerator profiles.
NOTES: This variant is also available:
gcloud alpha container ai profiles