You can flexibly request devices for your Google Kubernetes Engine (GKE) workloads by using dynamic resource allocation (DRA). This document shows you how to create a ResourceClaimTemplate to request devices, and then create a workload to observe how Kubernetes flexibly allocates the devices to your Pods.
This document is intended for Application operators and Data engineers who run workloads like AI/ML or high performance computing (HPC).
About requesting devices with DRA
When you set up your GKE infrastructure for DRA, the DRA drivers on your nodes create DeviceClass objects in the cluster. A DeviceClass defines a category of devices, such as GPUs, that are available to request for workloads. A platform administrator can optionally deploy additional DeviceClasses that limit which devices you can request in specific workloads.
To request devices within a DeviceClass, you create one of the following objects:
- ResourceClaim: A ResourceClaim lets a Pod or a user request hardware resources by filtering for certain parameters within a DeviceClass.
- ResourceClaimTemplate: A ResourceClaimTemplate defines a template that Pods can use to automatically create new per-Pod ResourceClaims.
For more information about ResourceClaims and ResourceClaimTemplates, see When to use ResourceClaims and ResourceClaimTemplates.
The examples on this page use a basic ResourceClaimTemplate to request the specified device configuration. For more information about all of the fields that you can specify, see the ResourceClaimTemplate API reference.
Limitations
- Node auto-provisioning isn't supported.
- Autopilot clusters don't support DRA.
- You can't use the following GPU sharing features:
- Time-sharing GPUs
- Multi-instance GPUs
- Multi-process Service (MPS)
Requirements
To use DRA, your GKE version must be version 1.34 or later.
You should also be familiar with the following requirements and limitations:
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.
- Ensure that your GKE clusters are configured for DRA workloads.
Use DRA to deploy workloads
To request per-Pod device allocation, you create a ResourceClaimTemplate that has your requested device configuration, such as GPUs of a specific type. When you deploy a workload that references the ResourceClaimTemplate, Kubernetes creates ResourceClaims for each Pod in the workload based on the ResourceClaimTemplate. Kubernetes allocates the requested resources and schedules the Pods on corresponding nodes.
To request devices in a workload with DRA, select one of the following options:
GPU
Save the following manifest as
claim-template.yaml:apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: gpu-claim-template spec: spec: devices: requests: - name: single-gpu exactly: deviceClassName: gpu.nvidia.com allocationMode: ExactCount count: 1Create the ResourceClaimTemplate:
kubectl create -f claim-template.yamlTo create a workload that references the ResourceClaimTemplate, save the following manifest as
dra-gpu-example.yaml:apiVersion: apps/v1 kind: Deployment metadata: name: dra-gpu-example spec: replicas: 1 selector: matchLabels: app: dra-gpu-example template: metadata: labels: app: dra-gpu-example spec: containers: - name: ctr image: ubuntu:22.04 command: ["bash", "-c"] args: ["echo $(nvidia-smi -L || echo Waiting...)"] resources: claims: - name: single-gpu resourceClaims: - name: single-gpu resourceClaimTemplateName: gpu-claim-template tolerations: - key: "nvidia.com/gpu" operator: "Exists" effect: "NoSchedule"Deploy the workload:
kubectl create -f dra-gpu-example.yaml
TPU
Save the following manifest as
claim-template.yaml:apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: tpu-claim-template spec: spec: devices: requests: - name: all-tpus exactly: deviceClassName: tpu.google.com allocationMode: AllThis ResourceClaimTemplate requests that GKE allocate an entire TPU node pool to every ResourceClaim.
Create the ResourceClaimTemplate:
kubectl create -f claim-template.yamlTo create a workload that references the ResourceClaimTemplate, save the following manifest as
dra-tpu-example.yaml:apiVersion: apps/v1 kind: Deployment metadata: name: dra-tpu-example spec: replicas: 1 selector: matchLabels: app: dra-tpu-example template: metadata: labels: app: dra-tpu-example spec: containers: - name: ctr image: ubuntu:22.04 command: - /bin/sh - -c - | echo "Environment Variables:" env echo "Sleeping indefinitely..." sleep infinity resources: claims: - name: all-tpus resourceClaims: - name: all-tpus resourceClaimTemplateName: tpu-claim-template tolerations: - key: "google.com/tpu" operator: "Exists" effect: "NoSchedule"Deploy the workload:
kubectl create -f dra-tpu-example.yaml
Verify the hardware allocation
You can verify that your workloads have been allocated hardware by checking the ResourceClaim or by looking at the logs for your Pod. To verify the allocation for GPUs or TPUs, select one of the following options:
GPU
Get the ResourceClaim associated with the workload that you deployed:
kubectl get resourceclaimsThe output is similar to the following:
NAME STATE AGE dra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdh allocated,reserved 9sGet more details about the hardware assigned to the Pod:
kubectl describe resourceclaims RESOURCECLAIMReplace
RESOURCECLAIMwith the full name of the ResourceClaim that you got from the output of the previous step.The output is similar to the following:
Name: dra-gpu-example-68f595d7dc-prv27-single-gpu-qgjq5 Namespace: default Labels: <none> Annotations: resource.kubernetes.io/pod-claim-name: single-gpu API Version: resource.k8s.io/v1 Kind: ResourceClaim Metadata: # Multiple lines are omitted here. Spec: Devices: Requests: Exactly: Allocation Mode: ExactCount Count: 1 Device Class Name: gpu.nvidia.com Name: single-gpu Status: Allocation: Devices: Results: Device: gpu-0 Driver: gpu.nvidia.com Pool: gke-cluster-1-dra-gpu-pool-b56c4961-7vnm Request: single-gpu Node Selector: Node Selector Terms: Match Fields: Key: metadata.name Operator: In Values: gke-cluster-1-dra-gpu-pool-b56c4961-7vnm Reserved For: Name: dra-gpu-example-68f595d7dc-prv27 Resource: pods UID: e16c2813-08ef-411b-8d92-a72f27ebf5ef Events: <none> ```Get logs for the workload that you deployed:
kubectl logs deployment/dra-gpu-example --all-pods=trueThe output is similar to the following:
[pod/dra-gpu-example-64b75dc6b-x8bd6/ctr] GPU 0: Tesla T4 (UUID: GPU-2087ac7a-f781-8cd7-eb6b-b00943cc13ef)The output of these steps shows that GKE allocated one GPU to the container.
TPU
Get the ResourceClaim associated with the workload that you deployed:
kubectl get resourceclaims | grep dra-tpu-exampleThe output is similar to the following:
NAME STATE AGE dra-tpu-example-64b75dc6b-x8bd6-all-tpus-jwwdh allocated,reserved 9sGet more details about the hardware assigned to the Pod:
kubectl describe resourceclaims RESOURCECLAIM -o yamlReplace
RESOURCECLAIMwith the full name of the ResourceClaim that you got from the output of the previous step.The output is similar to the following:
apiVersion: resource.k8s.io/v1beta1 kind: ResourceClaim metadata: annotations: resource.kubernetes.io/pod-claim-name: all-tpus creationTimestamp: "2025-03-04T21:00:54Z" finalizers: - resource.kubernetes.io/delete-protection generateName: dra-tpu-example-59b8785697-k9kzd-all-gpus- name: dra-tpu-example-59b8785697-k9kzd-all-gpus-gnr7z namespace: default ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: Pod name: dra-tpu-example-59b8785697-k9kzd uid: c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f resourceVersion: "12189603" uid: 279b5014-340b-4ef6-9dda-9fbf183fbb71 spec: devices: requests: - allocationMode: All deviceClassName: tpu.google.com name: all-tpus status: allocation: devices: results: - adminAccess: null device: "0" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "1" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "2" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "3" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "4" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "5" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "6" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus - adminAccess: null device: "7" driver: tpu.google.com pool: gke-tpu-2ec29193-bcc0 request: all-tpus nodeSelector: nodeSelectorTerms: - matchFields: - key: metadata.name operator: In values: - gke-tpu-2ec29193-bcc0 reservedFor: - name: dra-tpu-example-59b8785697-k9kzd resource: pods uid: c2f4fe66-9a73-4bd3-a574-4c3eea5fda3fGet logs for the workload that you deployed:
kubectl logs deployment/dra-tpu-example --all-pods=true | grep "TPU"The output is similar to the following:
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_CHIPS_PER_HOST_BOUNDS=2,4,1 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_TOPOLOGY_WRAP=false,false,false [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_SKIP_MDS_QUERY=true [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_RUNTIME_METRICS_PORTS=8431,8432,8433,8434,8435,8436,8437,8438 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_WORKER_ID=0 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_WORKER_HOSTNAMES=localhost [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_TOPOLOGY=2x4 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_ACCELERATOR_TYPE=v6e-8 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_HOST_BOUNDS=1,1,1 [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_TOPOLOGY_ALT=false [pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_DEVICE_0_RESOURCE_CLAIM=77e68f15-fa2f-4109-9a14-6c91da1a38d3The output of these steps indicates that all of the TPUs in a node pool were allocated to the Pod.