本頁面的部分或所有資訊可能不適用於 S3NS 的 Cloud de Confiance。詳情請參閱「與 Google Cloud 的差異」。

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Compute Engine 中的 Cloud TPU 資源

您可以使用 Compute Engine 資源建立及管理 Tensor Processing Unit (TPU)。本頁面提供概念總覽，說明如何搭配使用 TPU 與 Compute Engine。這份指南會將 TPU 概念對應至 Compute Engine 資源，並概述建立 TPU 資源的高階工作流程。

主要 TPU 概念

如要管理 Compute Engine 中的 TPU 資源，瞭解下列主要 TPU 概念會有所助益：

TPU VM：直接連線至 TPU 硬體的虛擬機器。單一 TPU VM 與單一主機配量相同。
TPU 配量：互連 TPU 晶片的邏輯群組，可透過一或多個 TPU VM 存取。切片具有下列其中一個範圍：
- 單一主體機器配量：由一部主體機器組成的配量。單主機配量是指單一 TPU VM。
- 多主機配量：由多個 TPU VM 組成的配量，透過高速晶片間互連 (ICI) 網路相互連線。

TPU 和 Compute Engine 概念圖

下表說明 TPU 概念如何對應至 Compute Engine 資源：

Cloud TPU 概念	Compute Engine 資源	資源詳細資料	用途
TPU VM	VM 執行個體	可直接存取 TPU 硬體的 Compute Engine VM。	個別 VM 工作、執行 SSH 指令或偵錯
TPU 單一主機配量	VM 執行個體或含單一 VM 的 MIG	由一部實體主體機器組成的設定。	使用自動調度資源功能進行推論
TPU 多主機配量	在工作負載政策中指定加速器拓撲的 MIG	透過 ICI 互連的一組 TPU VM，可做為單一邏輯單元管理。	需要原子佈建的大規模分散式訓練

從 Cloud TPU API 遷移

我們將停止 Cloud TPU API 的開發作業，包括 Cloud TPU API 適用的 Google Cloud CLI，以及 Cloud TPU API 適用的 Cloud 用戶端程式庫。Cloud TPU API 只會收到錯誤修正和安全性更新。從 TPU7x (Ironwood) 開始，新一代硬體僅支援透過 Compute Engine 或 Google Kubernetes Engine (GKE) 使用。如要使用最新功能並支援最新 TPU 版本，請遷移並將舊版 Cloud TPU API 呼叫替換為 Compute Engine 或 GKE 中的對等項目。

請根據自動化調度管理和工作負載需求，選擇下列其中一個路徑：

Compute Engine：建議需要直接控管 VM 層級或自訂 OS 映像檔的使用者採用。如要在 Compute Engine 中開始佈建 TPU，請參閱「快速入門：建立 TPU VM」。
GKE：建議用於容器化工作負載、自動調度資源，以及大規模自動化調度管理。如要進一步瞭解如何透過 GKE 使用 TPU，請參閱「GKE 中的 TPU 簡介」。

現有 TPU 資源

使用 Cloud TPU API (Node 或 QueuedResource REST 物件) 建立的 TPU 資源與 Compute Engine 和 GKE 不相容。如要開始使用 Compute Engine 或 GKE，請按照下列步驟操作：

重新編寫使用 Cloud TPU API 的任何指令碼，改用 Compute Engine 或 GKE API。
使用 Cloud TPU API 刪除資源，然後使用 Compute Engine 或 GKE API 重新建立資源。

限制

Compute Engine 中的 TPU 有下列限制：

TPU 版本：Compute Engine 支援 v5p、v6e 和 TPU7x。
容量模式：Compute Engine 不支援 TPU 的「所有容量」模式。
多配量：Compute Engine 無法建立互連的多主機 TPU 配量群組。如要使用 Multislice，必須使用 Google Kubernetes Engine (GKE)。詳情請參閱「在 GKE 中部署 TPU 多重切片」。
集合：Compute Engine 不支援集合排程。如要使用集合排程，必須使用 GKE。詳情請參閱 GKE 說明文件中的「收集排程」。

Compute Engine 中的 Cloud TPU 資源

主要 TPU 概念

TPU 和 Compute Engine 概念圖

從 Cloud TPU API 遷移

現有 TPU 資源

限制

後續步驟