About TPUs on Cloud de Confiance by S3NS

Tensor Processing Units (TPUs) are Google's custom-developed, application-specific integrated circuits (ASICs) designed to accelerate machine learning (ML) and artificial intelligence (AI) workloads. Whether you are training complex foundation models for weeks or running large-scale inference, TPUs offer scalable, specialized computing resources optimized for frameworks like JAX and PyTorch.

Cloud TPUs are engineered to tackle the most demanding AI workloads. The key benefits include:

  • Optimized for matrix computations: TPUs are specifically designed with Matrix Multiply Units (MXUs) to execute the massive matrix operations fundamental to ML algorithms with exceptional efficiency.

  • High-bandwidth memory (HBM): On-chip high-bandwidth memory lets you train and serve larger models and effectively utilize larger batch sizes.

  • Massive scalability with slices: TPU chips can be connected in groups called slices. The slices let your workloads achieve scaling up to thousands of TPU chips for massive training jobs.

When to use TPUs

TPUs are optimized for specific workloads, such as the following:

  • Models dominated by matrix computations
  • Models with no custom PyTorch/JAX operations inside the main training loop
  • Models that train for weeks or months
  • Large models with large effective batch sizes
  • Models with ultra-large embeddings common in advanced ranking and recommendation workloads

TPUs are not suited to the following workloads:

  • Linear algebra programs that require frequent branching or contain many element-wise algebra operations
  • Workloads that require high-precision arithmetic
  • Neural network workloads that contain custom operations in the main training loop

Provisioning options on Cloud de Confiance by S3NS

You can access and provision TPUs by using the following Cloud de Confiance by S3NS products depending on your operational needs.

Compute Engine

Compute Engine lets you create and manage individual TPU VMs or slices, providing you with the capability for full lifecycle management of TPU VMs. Google recommends that you use Compute Engine over the legacy Cloud TPU API to provision your TPU resources.

To learn more, see Cloud TPU resources in Compute Engine.

Google Kubernetes Engine

Google Kubernetes Engine (GKE) provides a fully managed, multi-tenant Kubernetes environment for orchestrating large-scale AI workloads. GKE supports TPU node and node pool lifecycle management, including creating, configuring, and deleting TPU VMs.

To learn more, see About TPUs in GKE.

Cloud TPU

The Cloud TPU API, including the Google Cloud CLI and Cloud Client Libraries for Cloud TPU, is no longer under development. For provisioning and managing TPU resources, Google recommends that you use Compute Engine or GKE, based on your orchestration and workload needs.

For more information, see Migrate from the Cloud TPU API.

TPU versions supported in Compute Engine

Compute Engine supports the following TPU versions:

  • TPU7x (Ironwood)
  • TPU v6e (Trillium)
  • TPU v5p

For more information about each TPU version, see TPU machines.

What's next