This guide shows you how to optimize GPU provisioning for medium- and small-scale training workloads by using flex-start provisioning mode. In this guide, you use flex-start to deploy a workload that consists of two Kubernetes Jobs. Each Job requires one GPU. GKE automatically provisions a single node with two A100 GPUs to run both Jobs.
If your workload requires multi-node distributed processing, consider using flex-start with queued provisioning. For more information, see Run a large-scale workload with flex-start with queued provisioning.
This guide is intended for Machine learning (ML) engineers, Platform admins and operators, and for Data and AI specialists who are interested in using Kubernetes container orchestration capabilities for running batch workloads. For more information about common roles and example tasks that we reference in Trusted Cloud by S3NS content, see Common GKE user roles and tasks.
Flex-start pricing
Flex-start is recommended if your workload requires dynamically provisioned resources as needed, for up to seven days with short-term reservations, no complex quota management, and cost-effective access. Flex-start is powered by Dynamic Workload Scheduler and is billed using Dynamic Workload Scheduler pricing:
- Discounted (up to 53%) for vCPUs, GPUs, and TPUs.
- You pay as you go.
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
- Verify that you have an Autopilot cluster or a Standard cluster that's running version 1.33.0-gke.1712000 or later.
- Verify that you're familiar with limitations of flex-start.
- When using a Standard cluster, verify that you maintain at least one node pool without flex-start enabled for the cluster to function correctly.
- Verify that you have quota for preemptible GPUs in your node locations.
If you don't have a cluster or your cluster doesn't meet the requirements, you can create a Standard regional cluster using the gcloud CLI. Add the following flags so that you can learn about flex-start:
--location=us-central1 \
--node-locations=us-central1-a,us-central1-b \
--machine-type=g2-standard-8
When you create a flex-start node pool, use the
previously mentioned flags flags and --accelerator type=nvidia-l4,count=1
.
If you have a Standard cluster that meets the requirements, the next sections guide you through selecting a GPU accelerator type and machine type for your cluster.
Choose a GPU accelerator type
If you use a cluster in Autopilot mode, skip this section and go to the Run a batch workload section.
GPU availability is specific to each zone. You need to find a GPU accelerator type that is available in a zone that the Standard cluster is in. If you have a regional Standard cluster, the zone in which the GPU accelerator type is available must be in the region that the cluster is in. When you create the node pool, you specify the accelerator type and the zones for the nodes. If you specify an accelerator type that isn't available in the cluster's location, the node pool creation fails.
Run the following commands to get your cluster's location and a supported GPU accelerator type.
Get the location that the cluster is in:
gcloud container clusters list
The output is similar to the following:
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS STACK_TYPE example-cluster-1 us-west2 1.33.2-gke.1111000 34.102.3.122 e2-medium 1.33.2-gke.1111000 9 RUNNING IPV4
List the available GPU accelerator types, excluding Virtual Workstations in the location:
gcloud compute accelerator-types list | grep LOCATION_NAME | grep -v "Workstation"
Replace
LOCATION_NAME
with the cluster's location.For example, to get a list of GPU accelerator types in the
us-west2
region, run the following command:gcloud compute accelerator-types list | grep us-west2 | grep -v "Workstation"
The output is similar to the following:
nvidia-b200 us-west2-c NVIDIA B200 180GB nvidia-tesla-p4 us-west2-c NVIDIA Tesla P4 nvidia-tesla-t4 us-west2-c NVIDIA T4 nvidia-tesla-p4 us-west2-b NVIDIA Tesla P4 nvidia-tesla-t4 us-west2-b NVIDIA T4
Choose a compatible machine type
If you use a cluster in Autopilot mode, skip this section and go to the Run a batch workload section.
After you know which GPUs are available in the cluster's location, you can determine the compatible machine types. Trusted Cloud by S3NS restricts GPUs to specific machine series. Use the following steps to find a machine type:
- Refer to the GPU models available table.
- Locate the row for the GPU accelerator type you have chosen.
- Look at the "Machine series" column for that row. This column tells you which machine series you must use.
- To see the machine type names you can specify, click the link on the machine series.
The only exception is the N1 machine series, which provides additional guidance on which N1 machine types you can use with your chosen accelerator type.
Before using an accelerator-optimized machine, make sure that it's supported with flex-start provisioning mode, as shown in Consumption option availability by machine type.
Determine the accelerator count
If you use a cluster in Autopilot mode, skip this section and go to the Run a batch workload section.
To create a node pool, you need to determine the number of accelerators to
attach to each node in the node pool. Valid values depend on your accelerator
type and machine type. Each machine type has a limit on how many GPUs it can
support. To determine what value to use (besides the default of 1
):
- Refer to GPU machine types.
- In the table, search for your accelerator type for your machine series type.
- Use the value in the "GPU count" column.
Create a node pool with flex-start
If you use a cluster in Autopilot mode, skip this section and go to the Run a batch workload section.
To create a node pool with flex-start enabled on an existing Standard cluster, you can use the gcloud CLI or Terraform.
gcloud
Create a node pool with flex-start:
gcloud container node-pools create NODE_POOL_NAME \ --cluster CLUSTER_NAME \ --location LOCATION_NAME \ --project PROJECT_ID \ --accelerator type=ACCELERATOR_TYPE,count=COUNT \ --machine-type MACHINE_TYPE \ --max-run-duration MAX_RUN_DURATION \ --flex-start \ --node-locations NODE_ZONES \ --num-nodes 0 \ --enable-autoscaling \ --total-min-nodes 0 \ --total-max-nodes 5 \ --location-policy ANY \ --reservation-affinity none \ --no-enable-autorepair
Replace the following:
NODE_POOL_NAME
: the name you choose for your node pool.CLUSTER_NAME
: the name of the Standard cluster you want to modify.LOCATION_NAME
: the compute region for the cluster control plane.PROJECT_ID
: your project ID.ACCELERATOR_TYPE
: the specific type of accelerator (for example,nvidia-tesla-t4
for NVIDIA T4) to attach to the instances.COUNT
: the number of accelerators to attach to the instances. The default value is1
.MACHINE_TYPE
: the type of machine to use for nodes.MAX_RUN_DURATION
: optional. The maximum runtime of a node in seconds, up to the default of seven days. The number that you enter must end ins
. For example, to specify one day, enter86400s
.NODE_ZONES
: a comma-separated list of one or more zones where GKE creates the node pool.
In this command, the
--flex-start
flag instructsgcloud
to create a node pool with flex-start enabled.GKE creates a node pool with nodes that contain two instances of the specified accelerator type. The node pool initially has zero nodes and autoscaling is enabled
Verify the status of flex-start in the node pool:
gcloud container node-pools describe NODE_POOL_NAME \ --cluster CLUSTER_NAME \ --location LOCATION_NAME \ --format="get(config.flexStart)"
If flex-start is enabled in the node pool, the
flexStart
field is set toTrue
.
Terraform
You can use flex-start with GPUs by using a Terraform module.
- Add the following block to your Terraform configuration:
resource "google_container_node_pool" " "gpu_dws_pool" {
name = "gpu-dws-pool"
queued_provisioning {
enabled = false
}
}
node_config {
machine_type = "MACHINE_TYPE"
accelerator_type = "ACCELERATOR_TYPE"
accelerator_count = COUNT
node_locations = ["NODE_ZONES"]
flex_start = true
}
Replace the following:
MACHINE_TYPE
: the type of machine to use for nodes.ACCELERATOR_TYPE
: the specific type of accelerator (for example,nvidia-tesla-t4
for NVIDIA T4) to attach to the instances.COUNT
: the number of accelerators to attach to the instances. The default value is1
.NODE_ZONES
: the comma-separated list of one or more zones where GKE creates the node pool.
Terraform calls Trusted Cloud APIs to create a cluster with a node
pool that uses flex-start with GPUs. The node pool initially has zero
nodes and autoscaling is enabled. To learn more about Terraform, see the
google_container_node_pool
resource spec on terraform.io.
Run a batch workload
In this section, you create two Kubernetes Jobs that require one GPU each. A Job controller in Kubernetes creates one or more Pods and ensures that they successfully execute a specific task.In the Trusted Cloud console, launch a Cloud Shell session by clicking
Activate Cloud Shell. A session opens in the bottom pane of the Trusted Cloud console.
Create a file named
dws-flex-start.yaml
:apiVersion: batch/v1 kind: Job metadata: name: job-1 spec: template: spec: nodeSelector: cloud.google.com/gke-flex-start: "true" cloud.google.com/gke-gpu-accelerator: ACCELERATOR_TYPE containers: - name: container-1 image: gcr.io/k8s-staging-perf-tests/sleep:latest args: ["10s"] # Sleep for 10 seconds resources: requests: nvidia.com/gpu: 1 limits: nvidia.com/gpu: 1 restartPolicy: OnFailure --- apiVersion: batch/v1 kind: Job metadata: name: job-2 spec: template: spec: nodeSelector: cloud.google.com/gke-flex-start: "true" containers: - name: container-2 image: gcr.io/k8s-staging-perf-tests/sleep:latest args: ["10s"] # Sleep for 10 seconds resources: requests: nvidia.com/gpu: 1 limits: nvidia.com/gpu: 1 restartPolicy: OnFailure
Apply the
dws-flex-start.yaml
manifest:kubectl apply -f dws-flex-start.yaml
Verify that the Jobs are running on the same node:
kubectl get pods -l "job-name in (job-1,job-2)" -o wide
The output is similar to the following:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES job-1 0/1 Completed 0 19m 10.(...) gke-flex-zonal-a2 <none> <none> job-2 0/1 Completed 0 19m 10.(...) gke-flex-zonal-a2 <none> <none>
Clean up
To avoid incurring charges to your Trusted Cloud by S3NS account for the resources that you used on this page, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
- In the Trusted Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete the individual resource
Delete the Jobs:
kubectl delete job -l "job-name in (job-1,job-2)"
Delete the node pool:
gcloud container node-pools delete NODE_POOL_NAME \ --location LOCATION_NAME
Delete the cluster:
gcloud container clusters delete CLUSTER_NAME
What's next
- Learn more about GPUs in GKE.
- Learn more about node auto-provisioning.
- Learn more about Best practices for running batch workloads on GKE.