This document shows you how to use automated networking for accelerator VMs, such as GPUs and TPs, to simplify the network configuration for Google Kubernetes Engine (GKE) accelerator workloads. This is essential for running artificial intelligence (AI), machine learning (ML), and high performance computing (HPC) on accelerator-optimized machines.
This document assumes familiarity with fundamental GKE concepts, GPU and TPU workloads, and VPC networking. Specifically, you should be familiar with:
This page is intended for Cloud architects and Networking specialists who design and architect their organization's network. For an overview of all GKE documentation sets, see Explore GKE documentation. To learn more about common roles and example tasks referenced in Cloud de Confiance by S3NS content, see Common GKE user roles and tasks.
GKE simplifies running high-performance AI and ML on specialized accelerators. With automated networking for accelerator VMs, you can enable high-speed, multi-network connectivity—essential for protocols like RDMA—with a single configuration flag. This automation eliminates the complex, manual process of setting up multiple VPC networks, managing IP address ranges, and configuring network interfaces for every node pool and Pod. By using a single parameter when creating a node pool, GKE provides all the necessary cloud and Kubernetes networking resources.
Terminology
The following terms are key to understanding the networking architecture for accelerator VMs.
- Virtual Private Cloud (VPC): a VPC is a virtual version of a physical network, implemented inside of Google's production network. It provides connectivity for your Compute Engine virtual machine (VM) instances, GKE clusters, and other resources.
- Titanium NIC: a smart NIC that offloads network processing tasks from the CPU, freeing the CPU to focus on your workloads. On GPU machines, they handle all traffic that is not direct GPU-to-GPU communication. On TPU machines, all NICs are Titanium NICs.
- Subnetwork: a subnetwork is a segmented piece of a larger VPC. Each subnetwork is associated with a region and has a defined IP address range.
- Network Interface Controller (NIC): a NIC is a virtual network interface that connects a VM instance to a network. Each NIC is attached to a specific VPC and subnetwork.
- Host network: the primary network used by the node's main network interfaces (NICs) for general cluster communication, such as control plane traffic and regular Pod networking.
- Data network: a dedicated network for high-performance data transfer between accelerator VMs. For GPUs, this is often a GPUDirect VPC with RDMA. For TPUs, this might be a second host network.
- Remote Direct Memory Access(RDMA): RDMA is a technology that allows network devices to exchange data directly with the main memory of a computer without involving the operating system or CPU. This significantly reduces latency and improves throughput, which is critical for HPC and ML workloads.
- NVLink: NVLink is a high-speed interconnect technology developed by NVIDIA to connect multiple GPUs within a single node, enabling them to share memory and work together on large datasets.
- Kubernetes dynamic resource allocation (DRA): DRA is a Kubernetes feature that provides a more flexible way for Pods to request and consume resources, such as GPUs and other specialized hardware. It allows for fine-grained control over resource allocation.
How automated networking works
Accelerator-optimized machines have a specialized network architecture to support high-throughput, low-latency communication between GPUs and TPUs. Each physical machine contains multiple GPUs or TPUs, often connected by high-speed interconnects like NVLink. The machines also have one or more NICs for general networking and multiple GPU NICs for high-speed interconnects.
When you create a GKE node that uses an accelerator-optimized machine type, GKE configures multiple NICs on the underlying VM. Host NICs connect to host VPC networks for general cluster communication and management to communicate with the control plane. GPU NICs connect to a dedicated, high-performance VPC network, often with RDMA enabled and a high MTU setting (8896), to facilitate GPUDirect communication.
When a Pod requests GPUs or TPUs, you can configure it to access the high-performance network interfaces on the node. You can request all available NICs or a specific subset. Each claimed network interface is dedicated to a single Pod and isn't shared. This network configuration ensures the Pod has sole access to the full bandwidth and resources of that interface, a key benefit for performance-sensitive workloads.
Limitations
- Automated networking for accelerator VMs is not supported on Autopilot clusters.
- Automated networking requires the cluster to use GKE Dataplane V2.
- Supported machine types: Automated networking is supported on A3, A4, and TPU Trillium (v6e) accelerator-optimized machine families.
- Single-zone node pools required: You must use a node pool with a single zone.
- When using GKE managed DRANET to configure workloads, see the key considerations and limitations for GKE managed DRANET.
- You can't use both the multi-network API and DRANET in the same nodepool. You must choose one method for network attachment for your Pods.
Accelerator-optimized machines network configurations
Accelerator-optimized machines have varying network configurations depending on their type. The following table summarizes the network specifications for various machine types.
GPU accelerator VMs
| Machine type | Number of GPUs | Number of Titanium NICs | Number of GPU NICs | GPUDirect Technology | Additional VPCs |
|---|---|---|---|---|---|
| A3 | 8 (H100) | 1 | 4 | TCPX | 4 for GPU NICs |
| A3 Mega | 8 (H100) | 1 | 8 | TCPXO | 8 for GPU NICs |
| A3 Ultra | 8 (H200) | 2 | 8 | RDMA | 2 (1 for second NIC, 1 for GPU NICs) |
| A4 | 8 (B200) | 2 | 8 | RDMA | 2 (1 for second NIC, 1 for GPU NICs) |
| A4X | 4 (GB200) | 1 | 4 | RDMA | 2 (1 for second NIC, 1 for GPU NICs) |
TPU accelerator VMs
| Machine type | Number of TPU chips | Number of NICs | Additional VPCs |
|---|---|---|---|
| TPU Trillium (v6e) (ct6e-standard-4t) | 4 | 2 | 2 (1 for 2nd NIC, 1 for extra VNIC on 1st NIC) |
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.
Ensure your cluster uses GKE version 1.34.1-gke.1829001 or later.
Ensure that your cluster has GKE Dataplane V2. You can enable this feature when you create a new cluster or update an existing one.
Create a new cluster:
gcloud container clusters create CLUSTER_NAME \ --cluster-version=CLUSTER_VERSION \ --enable-dataplane-v2Replace the following:
CLUSTER_NAME: the name of your new cluster.CLUSTER_VERSION: the version of your cluster, must be 1.34.1-gke.1829001 or later.
Update an existing cluster:
gcloud container clusters update CLUSTER_NAME \ --enable-dataplane-v2Replace
CLUSTER_NAMEwith the name of your cluster.
If you plan to deploy GPU workloads that use RDMA, verify the existence of the
DeviceClassresources:kubectl get deviceclass mrdma.google.com
Create a node pool with a default network profile
To automatically create a network that connects all GPUs or TPUs machines
within a single zone, create a node pool with the auto accelerator network
profile.
gcloud
To create a node pool with an automatically configured network profile, run the following command:
gcloud beta container node-pools create NODE_POOL_NAME \
--accelerator-network-profile=auto \
--node-locations=ZONE \
--machine-type=MACHINE_TYPE
For more information about creating node pools with accelerators, see Run GPUs
in Autopilot node pools and
Deploy TPU workloads in
Autopilot. When you follow the
instructions in these documents, append the --accelerator-network-profile=auto
flag to the gcloud container node-pools create command.
For multi-host TPU slice node pools, you also need to add the --tpu-topology
flag.
Replace the following:
NODE_POOL_NAME: the name of your new node pool.ZONE: the zone for the node pool.MACHINE_TYPE: the machine type for the nodes, for example,a3-ultragpu-8g.
REST
In a request to the nodePools.create method, specify the
accelerator_network_profile field:
{
"nodePool": {
"name": "NODE_POOL_NAME",
"machineType": "MACHINE_TYPE",
...
"accelerator_network_profile": "auto"
}
}
Replace the following:
NODE_POOL_NAME: the name of your new node pool.MACHINE_TYPE: the machine type for the nodes, for example,a3-ultragpu-8g.
Schedule a a workload that uses GPUs
The following sections show you how to configure a GPU node pool and workload to use RDMA network interfaces with GKE managed DRANET. For more details, see Allocate network resources using GKE managed DRANET.
Enable GKE managed DRANET driver on a GPU node pool
To enable the GKE DRANET driver on a GPU node pool that supports
RDMA, add the cloud.google.com/gke-networking-dra-driver=true label when you
create the node pool.
gcloud beta container node-pools create NODE_POOL_NAME \
--region=REGION \
--cluster=CLUSTER_NAME \
--node-locations=NODE_LOCATIONS \
--accelerator type=ACCELERATOR_TYPE,count=ACCELERATOR_COUNT,gpu-driver-version=DRIVER_VERSION \
--machine-type=MACHINE_TYPE \
--num-nodes=NUM_NODES \
--reservation-affinity=specific \
--reservation=projects/RESERVATION_PROJECT/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK \
--accelerator-network-profile=auto \
--node-labels=cloud.google.com/gke-networking-dra-driver=true
Replace the following:
NODE_POOL_NAME: the name of your new node pool.REGION: the Cloud de Confiance region for your cluster.CLUSTER_NAME: the name of your cluster.ACCELERATOR_TYPE: the type of GPU accelerator:For example:
- A4 VMs: enter
nvidia-b200. - A3 Ultra VMs: enter
nvidia-h200-141gb.
- A4 VMs: enter
ACCELERATOR_COUNT: the number of GPUs to attach to nodes in the node pool. For example, for both a4-highgpu-8g and a3-ultragpu-8g VMs, the amount of GPUs is 8.DRIVER_VERSION: the GPU driver version to use. For example,defaultorlatest.MACHINE_TYPE: the machine type for the node pool, for example,a3-ultragpu-8g.NUM_NODES: the number of nodes for the node pool. For flex-start, this value must be set to 0.RESERVATION_PROJECT: the project ID of the reservation.RESERVATION_NAME: the name of your reservation. To find this value, see View future reservation requests.RESERVATION_BLOCK: the name of a specific block within the reservation. To find this value, see View future reservation requests.
This command uses accelerator network profiles to automatically configure VPC networks and subnets for your accelerator VMs. Alternatively, you can explicitly specify your VPC network and subnets.
Deploy a workload RDMA resources
To allocate RDMA resources for a Pod, specify a ResourceClaimTemplate.
Create a
ResourceClaimTemplateto define how to allocate the RDMA devices. The following manifest requests all availablemrdmadevices on the node. Save the manifest asall-mrdma-template.yaml:apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: all-mrdma spec: spec: devices: requests: - name: req-mrdma exactly: deviceClassName: mrdma.google.com allocationMode: AllApply the manifest:
kubectl apply -f all-mrdma-template.yamlDeploy your workload and reference the
ResourceClaimTemplate. The following manifest deploys a Pod that references theall-mrdmatemplate, which grants the Pod access to the RDMA interfaces on the node. Save the manifest asagnhost-rdma-pod.yaml:apiVersion: v1 kind: Pod metadata: name: agnhost-rdma namespace: default labels: app: agnhost spec: containers: - name: agnhost image: registry.k8s.io/e2e-test-images/agnhost:2.39 args: ["netexec", "--http-port", "80"] ports: - name: agnhost-port containerPort: 80 resources: claims: - name: rdma limits: nvidia.com/gpu: 1 resourceClaims: - name: rdma resourceClaimTemplateName: all-mrdmaApply the manifest:
kubectl apply -f agnhost-rdma-pod.yamlVerify that the additional allocated network interfaces are visible inside the Pod.
kubectl exec agnhost-rdma -- ls /sys/class/netThe following example output shows the default
eth0andlointerfaces, as well as the allocated RDMA interfaces, such asgpu0rdma0. The number and names of the network interfaces (NICs) vary based on the GKE node's machine type.eth0 gpu0rdma0 gpu1rdma0 gpu2rdma0 gpu3rdma0 lo
Schedule a workload that uses TPUs
The following sections show you how to configure a TPU node pool and workload to use non-RDMA network interfaces with GKE managed DRANET. For more details, see Allocate network resources using GKE managed DRANET.
Verify networking DeviceClasses
Verify that the DeviceClass resources for networking exist in your cluster.
kubectl get deviceclass netdev.google.com
The output is similar to the following:
NAME AGE
netdev.google.com 2d22h
Enable GKE managed DRANET driver on a TPU slice node pool
To enable the GKE DRANET driver when creating a TPU slice node pool, add the cloud.google.com/gke-networking-dra-driver=true label.
gcloud beta container node-pools create NODE_POOL_NAME \
--location=LOCATION \
--cluster=CLUSTER_NAME \
--node-locations=NODE_LOCATIONS \
--machine-type=MACHINE_TYPE \
--tpu-topology=TPU_TOPOLOGY \
--num-nodes=NUM_NODES \
--accelerator-network-profile=auto \
--node-labels=cloud.google.com/gke-networking-dra-driver=true
Replace the following:
NODE_POOL_NAME: The name of your new node pool.LOCATION: The Cloud de Confiance region or zone for your cluster.CLUSTER_NAME: The name of your cluster.NODE_LOCATIONS: The Cloud de Confiance zones for the nodes in the node pool.MACHINE_TYPE: The type of machine to use for nodes. For more information about TPU-compatible machine types, see Choose the TPU version.TPU_TOPOLOGY: The TPU topology, for example,2x4x4. The format of the topology depends on the TPU version. To learn more about TPU topologies, see Choose a topology.NUM_NODES: The number of nodes in the node pool.
For more information, see Create a single-host TPU slice node pool.
Deploy a workload claiming all network devices
To allocate non-RDMA network devices for a Pod, specify a ResourceClaimTemplate.
Create a
ResourceClaimTemplatethat references thenetdev.google.comDeviceClass. The following manifest requests all available non-RDMA network devices on the node.Save the manifest as
all-netdev-template.yaml:apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: all-netdev spec: spec: devices: requests: - name: req-netdev exactly: deviceClassName: netdev.google.com allocationMode: AllApply the manifest:
kubectl apply -f all-netdev-template.yamlDeploy your workload and reference the
ResourceClaimTemplate. The following manifest deploys a Pod that uses theall-netdevtemplate to grant the Pod access to all non-RDMA network devices on the node. Save the manifest asnetdev-pod.yaml:apiVersion: v1 kind: Pod metadata: name: agnhost-netdev namespace: default labels: app: agnhost spec: containers: - name: agnhost image: registry.k8s.io/e2e-test-images/agnhost:2.39 args: ["netexec", "--http-port", "80"] ports: - name: agnhost-port containerPort: 80 resources: claims: - name: netdev limits: google.com/tpu: 4 nodeSelector: cloud.google.com/gke-tpu-accelerator: TPU_ACCELERATOR cloud.google.com/gke-tpu-topology: TPU_TOPOLOGY resourceClaims: - name: netdev resourceClaimTemplateName: all-netdevReplace the following:
TPU_ACCELERATOR: The TPU accelerator type, for example,tpu-v5p-slice.TPU_TOPOLOGY: The TPU topology, for example,2x4x4.
Apply the manifest:
kubectl apply -f netdev-pod.yamlVerify that the additional allocated network interfaces are visible inside the Pod.
kubectl exec agnhost-netdev -- ls /sys/class/netThe following example output shows the default
eth0andlointerfaces, along with the allocated network devices, which have names likeeth1andeth2. The number of NICs and their names will vary based on the machine type of the GKE node.eth0 eth1 eth2 lo
Troubleshoot
To check the network setup for a node pool, run the following command:
gcloud beta container node-pools describe NODE_POOL_NAME \
--zone=ZONE \
--cluster=CLUSTER_NAME
Replace the following:
NODE_POOL_NAME: the name of your node pool.ZONE: the zone of the node pool.CLUSTER_NAME: the name of your cluster.
The output shows the additional networks and subnetworks attached to the node pool.
What's next
- Read about accelerator-optimized machines.
- Learn how to Provision GPUs with DRA.
- Learn how to Set up multi-network support for pods.
- Learn about Allocate network resources using GKE managed DRANET