複製延遲時間
本頁說明如何排解及修正 Cloud SQL 唯讀副本的複製延遲問題。
總覽
Cloud SQL 唯讀備用資源使用
PostgreSQL 串流複製。變更會寫入主要執行個體中的預寫記錄 (WAL)。WAL sender 會將 WAL 傳送至副本中的 WAL receiver,並套用這些 WAL。
複製延遲可能發生在下列情況:
- 主要執行個體無法快速將變更傳送至副本。
- 副本無法及時接收變更。
- 副本無法及時套用變更。
您可以使用
network_lag
指標監控上述前兩個原因。
第三個是透過
replica_lag
指標觀察到的。高
replica_lag
表示副本無法快速套用複製變更。您可以透過
replica_byte_lag
指標觀察總延遲時間,該指標含有標籤,可指出更多詳細資料。下方的「監控複寫延遲」
一節將說明這些指標。
最佳化查詢和結構定義
本節提供一些常見的查詢和結構定義最佳化建議,可提升複寫效能。
唯讀副本中長時間執行的查詢
備用資源中長時間執行的查詢可能會封鎖 Cloud SQL 的複製作業。您可能想為線上交易處理 (OLTP) 和線上分析處理 (OLAP) 分別建立副本,並只將長時間執行的查詢傳送至 OLAP 副本。
建議調整副本的 max_standby_archive_delay
和 max_standby_streaming_delay
旗標。
如果懷疑是 VACUUM 造成問題,且無法接受取消查詢,請考慮在副本中設定 hot_standby_feedback
標記。
詳情請參閱
PostgreSQL 說明文件。
因 DDL 而設的專屬鎖定
資料定義語言 (DDL) 指令 (例如 ALTER TABLE
和 CREATE INDEX
) 可能會因獨占鎖定而導致副本出現複製延遲。為避免鎖定爭用,請考慮在副本的查詢負載較低時,排定 DDL 執行作業。
詳情請參閱
PostgreSQL 說明文件。
過載的副本
如果唯讀副本收到太多查詢,複製作業可能會遭到封鎖。
請考慮將讀取作業分散到多個備用資源,以減少每個資源的負載。
為避免查詢量暴增,請考慮在應用程式邏輯或 Proxy 層 (如有使用) 中,調節副本讀取查詢。
如果主要執行個體上的活動量突然暴增,請考慮分散更新作業。
單體式主要資料庫
考慮垂直 (或水平) 分片主要資料庫,避免一或多個延遲資料表拖累所有其他資料表。
監控複製延遲
您可以使用 replica_lag
和 network_lag
指標監控複製延遲,並判斷延遲原因是否在於主要資料庫、網路或副本。
指標 | 說明 |
複製延遲
(cloudsql.googleapis.com/database/replication/replica_lag ) |
備用資源狀態落後主要執行個體狀態的秒數。這是指目前時間與原始時間戳記之間的差異,原始時間戳記是指主要資料庫在副本上套用交易時,提交交易的時間。具體來說,即使副本已收到寫入作業,但如果副本尚未將寫入作業套用至資料庫,寫入作業仍可能被視為延遲。
這項指標是使用副本中的 now() - pg_last_xact_replay_timestamp() 計算而得。這是估計值。如果複製作業中斷,副本就無法得知主要資料庫的領先程度,因此這項指標不會指出總延遲時間。
|
延遲位元組
(cloudsql.googleapis.com/database/postgres/replication/replica_byte_lag ) |
備用資料庫狀態落後主要資料庫狀態的位元組數。replica_byte_lag 會匯出 4 個時間序列,而 replica_lag_type 標籤可指出下列任一項目:
- sent_location:指出已產生但尚未傳送至副本的 WAL 位元組數。
- write_location:「寫入」減去「傳送」延遲時間,會顯示網路中的 WAL 位元組,這些位元組已傳送出去,但尚未寫入副本。
- flush_location:顯示寫入副本的 WAL 位元組,但尚未在副本中排清。
- replay_location:顯示以位元組為單位的總延遲時間。重播時間減去清除延遲時間,即為重播延遲時間。
|
網路延遲
(cloudsql.googleapis.com/database/replication/network_lag ) |
從主要資料庫中提交到副本的 WAL 接收器,所需的時間 (以秒為單位)。
如果 network_lag 為零或可忽略,但 replica_lag 很高,表示 WAL 接收器無法快速套用複製變更。
|
確認複製設定
如要確認複製作業是否正常運作,請對副本執行下列陳述式:
select status, last_msg_receipt_time from pg_stat_wal_receiver;
如果正在進行複製作業,您會看到 streaming
狀態和最近的 last_msg_receipt_time:
postgres=> select status, last_msg_receipt_time from pg_stat_wal_receiver;
status | last_msg_receipt_time
-----------+-------------------------------
streaming | 2020-01-21 20:19:51.461535+00
(1 row)
如果未進行複製作業,系統會傳回空白結果:
postgres=> select status, last_msg_receipt_time from pg_stat_wal_receiver;
status | last_msg_receipt_time
--------+-----------------------
(0 rows)
後續情況:
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-09-04 (世界標準時間)。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["缺少我需要的資訊","missingTheInformationINeed","thumb-down"],["過於複雜/步驟過多","tooComplicatedTooManySteps","thumb-down"],["過時","outOfDate","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["示例/程式碼問題","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# Replication lag\n\n\u003cbr /\u003e\n\n[MySQL](/sql/docs/mysql/replication/replication-lag \"View this page for the MySQL database engine\") \\| PostgreSQL \\| [SQL Server](/sql/docs/sqlserver/replication/replication-lag \"View this page for the SQL Server database engine\")\n\n\u003cbr /\u003e\n\nThis page describes how to troubleshoot and fix replication lag for Cloud SQL\nread replicas.\n\nOverview\n--------\n\nCloud SQL read replicas use [PostgreSQL streaming replication](https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION). Changes are written to Write-Ahead Log (WAL) in the primary instance. The WAL sender sends the WAL to the WAL receiver in the replica, where they are applied.\n\nReplication lag can happen in a few scenarios, such as:\n\n- The primary instance can't send the changes fast enough to the replica.\n- The replica can't receive the changes quickly enough.\n- The replica can't apply the changes quickly enough.\n\nThe first two reasons above can be monitored with the `network_lag` metric. The third is observed via the `replica_lag` metric. High `replica_lag` means that the replica can't apply replication changes fast enough. The total lag can be observed via `replica_byte_lag` metric, which has labels to indicate further details. These metrics are described in the [Monitor replication lag](#metrics) section below.\n\nOptimize queries and schema\n---------------------------\n\nThis section suggests some common query and schema optimizations you can make to\nimprove replication performance.\n\n### Long-running queries in the read replica\n\nLong-running queries in the replica might block replication for Cloud SQL.\nYou might want to have separate replicas for online transaction processing\n(OLTP) and online analytical processing (OLAP) purposes and only send\nlong-running queries to the OLAP replica.\n\nConsider adjusting the `max_standby_archive_delay` and\n`max_standby_streaming_delay` flags for your replica.\n\nIf you suspect VACUUM is the culprit, and query cancellation is not acceptable,\nconsider setting the `hot_standby_feedback` flag in the replica.\n\nReview [PostgreSQL documentation](https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT) for more information.\n\n### Exclusive locks due to DDL\n\nData definition language (DDL) commands, such as `ALTER TABLE` and\n`CREATE INDEX`, can cause replication lag in the replica due to\nexclusive locks. To avoid lock contention, consider scheduling DDL execution\nduring times when the query load is lower on the replicas.\nReview [PostgreSQL documentation](https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT) for more information.\n\n### Overloaded replica\n\nIf a read replica is receiving too many queries, replication could be blocked.\nConsider splitting the reads among multiple replicas to reduce the load on each\none.\n\nTo avoid query spikes, consider throttling replica read queries in your\napplication logic or in a proxy layer if you use one.\n\nIf there are spikes of activity on the primary instance, consider spreading out\nupdates.\n\n### Monolithic primary database\n\nConsider sharding the primary database vertically (or horizontally) to prevent\none or more lagging tables from holding back all the other tables.\n\nMonitor replication lag\n-----------------------\n\nYou can use the `replica_lag` and `network_lag` metrics to monitor replication\nlag and identify whether the cause of the lag is in the primary database,\nthe network, or the replica.\n\nVerify replication\n------------------\n\nTo verify that replication is working, run the following statement against the replica:\n\n\u003cbr /\u003e\n\n select status, last_msg_receipt_time from pg_stat_wal_receiver;\n\nIf replication is happening, you see the status `streaming` and a recent\nlast_msg_receipt_time: \n\n postgres=\u003e select status, last_msg_receipt_time from pg_stat_wal_receiver;\n status | last_msg_receipt_time\n -----------+-------------------------------\n streaming | 2020-01-21 20:19:51.461535+00\n (1 row)\n\nIf replication is not happening, an empty result is returned: \n\n postgres=\u003e select status, last_msg_receipt_time from pg_stat_wal_receiver;\n status | last_msg_receipt_time\n --------+-----------------------\n (0 rows)\n\nWhat's next:\n------------\n\n- [Promote replicas for regional migration or disaster recovery](/sql/docs/postgres/replication/cross-region-replicas)"]]