Run a small batch workload with GPUs and flex-start provisioning mode


This guide shows you how to optimize GPU provisioning for medium- and small-scale training workloads by using flex-start provisioning mode. In this guide, you use flex-start to deploy a workload that consists of two Kubernetes Jobs. Each Job requires one GPU. GKE automatically provisions a single node with two A100 GPUs to run both Jobs.

If your workload requires multi-node distributed processing, consider using flex-start with queued provisioning. For more information, see Run a large-scale workload with flex-start with queued provisioning.

This guide is intended for Machine learning (ML) engineers, Platform admins and operators, and for Data and AI specialists who are interested in using Kubernetes container orchestration capabilities for running batch workloads. For more information about common roles and example tasks that we reference in Trusted Cloud by S3NS content, see Common GKE user roles and tasks.

Flex-start pricing

Flex-start is recommended if your workload requires dynamically provisioned resources as needed, for up to seven days with short-term reservations, no complex quota management, and cost-effective access. Flex-start is powered by Dynamic Workload Scheduler and is billed using Dynamic Workload Scheduler pricing:

  • Discounted (up to 53%) for vCPUs, GPUs, and TPUs.
  • You pay as you go.

Before you begin

Before you start, make sure that you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.
  • Verify that you have an Autopilot cluster or a Standard cluster that's running version 1.33.0-gke.1712000 or later.
  • Verify that you're familiar with limitations of flex-start.
  • When using a Standard cluster, verify that you maintain at least one node pool without flex-start enabled for the cluster to function correctly.
  • Verify that you have quota for preemptible GPUs in your node locations.

If you don't have a cluster or your cluster doesn't meet the requirements, you can create a Standard regional cluster using the gcloud CLI. Add the following flags so that you can learn about flex-start:

--location=us-central1 \
--node-locations=us-central1-a,us-central1-b \
--machine-type=g2-standard-8

When you create a flex-start node pool, use the previously mentioned flags flags and --accelerator type=nvidia-l4,count=1.

If you have a Standard cluster that meets the requirements, the next sections guide you through selecting a GPU accelerator type and machine type for your cluster.

Choose a GPU accelerator type

If you use a cluster in Autopilot mode, skip this section and go to the Run a batch workload section.

GPU availability is specific to each zone. You need to find a GPU accelerator type that is available in a zone that the Standard cluster is in. If you have a regional Standard cluster, the zone in which the GPU accelerator type is available must be in the region that the cluster is in. When you create the node pool, you specify the accelerator type and the zones for the nodes. If you specify an accelerator type that isn't available in the cluster's location, the node pool creation fails.

Run the following commands to get your cluster's location and a supported GPU accelerator type.

  1. Get the location that the cluster is in:

    gcloud container clusters list
    

    The output is similar to the following:

    NAME                LOCATION  MASTER_VERSION      MASTER_IP     MACHINE_TYPE  NODE_VERSION        NUM_NODES  STATUS   STACK_TYPE
    example-cluster-1   us-west2  1.33.2-gke.1111000  34.102.3.122  e2-medium     1.33.2-gke.1111000  9          RUNNING  IPV4
    
  2. List the available GPU accelerator types, excluding Virtual Workstations in the location:

    gcloud compute accelerator-types list | grep LOCATION_NAME | grep -v "Workstation"
    

    Replace LOCATION_NAME with the cluster's location.

    For example, to get a list of GPU accelerator types in the us-west2 region, run the following command:

    gcloud compute accelerator-types list | grep us-west2 | grep -v "Workstation"
    

    The output is similar to the following:

    nvidia-b200            us-west2-c                 NVIDIA B200 180GB
    nvidia-tesla-p4        us-west2-c                 NVIDIA Tesla P4
    nvidia-tesla-t4        us-west2-c                 NVIDIA T4
    nvidia-tesla-p4        us-west2-b                 NVIDIA Tesla P4
    nvidia-tesla-t4        us-west2-b                 NVIDIA T4
    

Choose a compatible machine type

If you use a cluster in Autopilot mode, skip this section and go to the Run a batch workload section.

After you know which GPUs are available in the cluster's location, you can determine the compatible machine types. Trusted Cloud by S3NS restricts GPUs to specific machine series. Use the following steps to find a machine type:

  1. Refer to the GPU models available table.
  2. Locate the row for the GPU accelerator type you have chosen.
  3. Look at the "Machine series" column for that row. This column tells you which machine series you must use.
  4. To see the machine type names you can specify, click the link on the machine series.

The only exception is the N1 machine series, which provides additional guidance on which N1 machine types you can use with your chosen accelerator type.

Before using an accelerator-optimized machine, make sure that it's supported with flex-start provisioning mode, as shown in Consumption option availability by machine type.

Determine the accelerator count

If you use a cluster in Autopilot mode, skip this section and go to the Run a batch workload section.

To create a node pool, you need to determine the number of accelerators to attach to each node in the node pool. Valid values depend on your accelerator type and machine type. Each machine type has a limit on how many GPUs it can support. To determine what value to use (besides the default of 1):

  1. Refer to GPU machine types.
  2. In the table, search for your accelerator type for your machine series type.
  3. Use the value in the "GPU count" column.

Create a node pool with flex-start

If you use a cluster in Autopilot mode, skip this section and go to the Run a batch workload section.

To create a node pool with flex-start enabled on an existing Standard cluster, you can use the gcloud CLI or Terraform.

gcloud

  1. Create a node pool with flex-start:

    gcloud container node-pools create NODE_POOL_NAME \
        --cluster CLUSTER_NAME \
        --location LOCATION_NAME \
        --project PROJECT_ID \
        --accelerator type=ACCELERATOR_TYPE,count=COUNT \
        --machine-type MACHINE_TYPE \
        --max-run-duration MAX_RUN_DURATION \
        --flex-start \
        --node-locations NODE_ZONES \
        --num-nodes 0 \
        --enable-autoscaling \
        --total-min-nodes 0 \
        --total-max-nodes 5 \
        --location-policy ANY \
        --reservation-affinity none \
        --no-enable-autorepair
    

    Replace the following:

    • NODE_POOL_NAME: the name you choose for your node pool.
    • CLUSTER_NAME: the name of the Standard cluster you want to modify.
    • LOCATION_NAME: the compute region for the cluster control plane.
    • PROJECT_ID: your project ID.
    • ACCELERATOR_TYPE: the specific type of accelerator (for example, nvidia-tesla-t4 for NVIDIA T4) to attach to the instances.
    • COUNT: the number of accelerators to attach to the instances. The default value is 1.
    • MACHINE_TYPE: the type of machine to use for nodes.
    • MAX_RUN_DURATION: optional. The maximum runtime of a node in seconds, up to the default of seven days. The number that you enter must end in s. For example, to specify one day, enter 86400s.
    • NODE_ZONES: a comma-separated list of one or more zones where GKE creates the node pool.

    In this command, the --flex-start flag instructs gcloud to create a node pool with flex-start enabled.

    GKE creates a node pool with nodes that contain two instances of the specified accelerator type. The node pool initially has zero nodes and autoscaling is enabled

  2. Verify the status of flex-start in the node pool:

    gcloud container node-pools describe NODE_POOL_NAME \
        --cluster CLUSTER_NAME \
        --location LOCATION_NAME \
        --format="get(config.flexStart)"
    

    If flex-start is enabled in the node pool, the flexStart field is set to True.

Terraform

You can use flex-start with GPUs by using a Terraform module.

  1. Add the following block to your Terraform configuration:
resource "google_container_node_pool" " "gpu_dws_pool" {
name = "gpu-dws-pool"

queued_provisioning {
    enabled = false
}

}
node_config {
    machine_type = "MACHINE_TYPE"
    accelerator_type = "ACCELERATOR_TYPE"
    accelerator_count = COUNT
    node_locations = ["NODE_ZONES"]
    flex_start = true
}

Replace the following:

  • MACHINE_TYPE: the type of machine to use for nodes.
  • ACCELERATOR_TYPE: the specific type of accelerator (for example, nvidia-tesla-t4 for NVIDIA T4) to attach to the instances.
  • COUNT: the number of accelerators to attach to the instances. The default value is 1.
  • NODE_ZONES: the comma-separated list of one or more zones where GKE creates the node pool.

Terraform calls Trusted Cloud APIs to create a cluster with a node pool that uses flex-start with GPUs. The node pool initially has zero nodes and autoscaling is enabled. To learn more about Terraform, see the google_container_node_pool resource spec on terraform.io.

Run a batch workload

In this section, you create two Kubernetes Jobs that require one GPU each. A Job controller in Kubernetes creates one or more Pods and ensures that they successfully execute a specific task.

  1. In the Trusted Cloud console, launch a Cloud Shell session by clicking Cloud Shell activation icon Activate Cloud Shell. A session opens in the bottom pane of the Trusted Cloud console.

  2. Create a file named dws-flex-start.yaml:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-1
    spec:
      template:
        spec:
          nodeSelector:
            cloud.google.com/gke-flex-start: "true"
            cloud.google.com/gke-gpu-accelerator: ACCELERATOR_TYPE
          containers:
          - name: container-1
            image: gcr.io/k8s-staging-perf-tests/sleep:latest
            args: ["10s"] # Sleep for 10 seconds
            resources:
              requests:
                  nvidia.com/gpu: 1
              limits:
                  nvidia.com/gpu: 1
          restartPolicy: OnFailure
    ---
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-2
    spec:
      template:
        spec:
          nodeSelector:
            cloud.google.com/gke-flex-start: "true"
          containers:
          - name: container-2
            image: gcr.io/k8s-staging-perf-tests/sleep:latest
            args: ["10s"] # Sleep for 10 seconds
            resources:
              requests:
                  nvidia.com/gpu: 1
              limits:
                  nvidia.com/gpu: 1
          restartPolicy: OnFailure
    
  3. Apply the dws-flex-start.yaml manifest:

    kubectl apply -f dws-flex-start.yaml
    
  4. Verify that the Jobs are running on the same node:

    kubectl get pods -l "job-name in (job-1,job-2)" -o wide
    

    The output is similar to the following:

    NAME    READY   STATUS      RESTARTS   AGE   IP       NODE               NOMINATED NODE   READINESS GATES
    job-1   0/1     Completed   0          19m   10.(...) gke-flex-zonal-a2  <none>           <none>
    job-2   0/1     Completed   0          19m   10.(...) gke-flex-zonal-a2  <none>           <none>
    

Clean up

To avoid incurring charges to your Trusted Cloud by S3NS account for the resources that you used on this page, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

  1. In the Trusted Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete the individual resource

  1. Delete the Jobs:

    kubectl delete job -l "job-name in (job-1,job-2)"
    
  2. Delete the node pool:

    gcloud container node-pools delete NODE_POOL_NAME \
          --location LOCATION_NAME
    
  3. Delete the cluster:

    gcloud container clusters delete CLUSTER_NAME
    

What's next