复制延迟时间
本页面介绍如何排查和修复 Cloud SQL 读取副本的复制延迟问题。
概览
Cloud SQL 读取副本使用
PostgreSQL 流式复制功能。更改会写入主实例中的预写式日志 (WAL)。
WAL 发送者将 WAL 发送到副本中的 WAL 接收者,在其中应用它们。
在几种情况下可能会发生复制延迟,例如:
- 主实例将更改发送到副本的速度不够快。
- 副本接收更改的速度不够快。
- 副本无应用更改的速度不够快。
您可以使用
network_lag
指标监控上述前两个原因。第三个原因通过
replica_lag
指标观察。
replica_lag
高表示副本无法足够快速地应用复制更改。您可以通过
replica_byte_lag
指标观察总延迟,该指标具有指示更多详情的标签。下面的
监控复制延迟时间部分介绍了这些指标。
优化查询和架构
本部分建议一些可用于提高复制性能的常见查询和架构优化。
读取副本中长时间运行的查询
副本中的长时间运行的查询可能会阻止 Cloud SQL 的复制。您可能希望使用单独的副本进行在线事务处理 (OLTP) 和在线分析处理 (OLAP),并且仅将长时间运行的查询发送到 OLAP 副本。
请考虑调整副本的 max_standby_archive_delay
和 max_standby_streaming_delay
标志。
如果您怀疑 VACUUM 是问题所在,并且不接受查询取消,请考虑在副本中设置 hot_standby_feedback
标志。
如需了解详情,请参阅 PostgreSQL 文档。
DDL 导致的独占锁定
数据定义语言 (DDL) 命令(例如 ALTER TABLE
和 CREATE INDEX
)可能会导致副本因独占锁定而出现复制延迟。为避免锁定争用,请考虑在副本上的查询负载较低期间安排 DDL 执行。
如需了解详情,请参阅
PostgreSQL 文档。
过载的副本
如果读取副本收到的查询过多,则系统可能会阻止复制。请考虑在多个副本之间拆分读取,以减少每个副本的负载。
为了避免查询高峰,请考虑在应用逻辑或代理层(如果使用代理层)中限制副本读取查询。
如果主实例上出现活动峰值,请考虑分散更新。
单体式主数据库
请考虑将主数据库垂直(或水平)分片,以防止一个或多个滞后表阻止其他所有表。
监控复制延迟
您可以使用 replica_lag
和 network_lag
指标来监控复制延迟,并确定延迟原因在于主数据库、网络还是副本。
指标 | 说明 |
复制延迟
(cloudsql.googleapis.com/database/replication/replica_lag ) |
副本状态晚于主实例状态的秒数。这是当前时间与主数据库提交当前应用于副本的事务的原始时间戳之间的差值。具体而言,如果副本尚未将写入应用于数据库,则即使它们接收了写入,写入操作也可能被视为滞后。
此指标是使用副本中的 now() - pg_last_xact_replay_timestamp() 计算的。这是一个近似值。如果复制中断,则副本不会知道主数据库的进度,并且此指标不会指示总延迟。
|
延迟字节数
(cloudsql.googleapis.com/database/postgres/replication/replica_byte_lag ) |
副本状态滞后于主数据库状态的字节数。replica_byte_lag 会导出 4 个时序,replica_lag_type 标签可以表示以下任意一项:
- sent_location::表示已生成但尚未发送到副本的 WAL 字节数。
- write_location::写入字节数减去发送延迟字节数表示网络中已发出但尚未在副本中写入的 WAL 字节数。
- flush_location::Flush 字节数减去写入延迟字节数表示副本中已写入但尚未 flush 的 WAL 字节数。
- replay_location::显示总延迟(以字节为单位)。重放减去清空延迟时间表示重放延迟时间。
|
网络延迟
(cloudsql.googleapis.com/database/replication/network_lag ) |
从在主数据库中提交到抵达副本中的 WAL 接收器所需的秒数。
如果 network_lag 为零或可以忽略,但 replica_lag 较高,则表示 WAL 接收者应用复制更改的速度不够快。
|
验证复制
如需验证复制是否正常工作,请对副本运行以下语句:
select status, last_msg_receipt_time from pg_stat_wal_receiver;
如果正在进行复制,您会看到状态 streaming
和最近的 last_msg_receipt_time:
postgres=> select status, last_msg_receipt_time from pg_stat_wal_receiver;
status | last_msg_receipt_time
-----------+-------------------------------
streaming | 2020-01-21 20:19:51.461535+00
(1 row)
如果未发生复制,则返回空结果:
postgres=> select status, last_msg_receipt_time from pg_stat_wal_receiver;
status | last_msg_receipt_time
--------+-----------------------
(0 rows)
后续步骤:
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-08。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["没有我需要的信息","missingTheInformationINeed","thumb-down"],["太复杂/步骤太多","tooComplicatedTooManySteps","thumb-down"],["内容需要更新","outOfDate","thumb-down"],["翻译问题","translationIssue","thumb-down"],["示例/代码问题","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-08。"],[],[],null,["# Replication lag\n\n\u003cbr /\u003e\n\n[MySQL](/sql/docs/mysql/replication/replication-lag \"View this page for the MySQL database engine\") \\| PostgreSQL \\| [SQL Server](/sql/docs/sqlserver/replication/replication-lag \"View this page for the SQL Server database engine\")\n\n\u003cbr /\u003e\n\nThis page describes how to troubleshoot and fix replication lag for Cloud SQL\nread replicas.\n\nOverview\n--------\n\nCloud SQL read replicas use [PostgreSQL streaming replication](https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION). Changes are written to Write-Ahead Log (WAL) in the primary instance. The WAL sender sends the WAL to the WAL receiver in the replica, where they are applied.\n\nReplication lag can happen in a few scenarios, such as:\n\n- The primary instance can't send the changes fast enough to the replica.\n- The replica can't receive the changes quickly enough.\n- The replica can't apply the changes quickly enough.\n\nThe first two reasons above can be monitored with the `network_lag` metric. The third is observed via the `replica_lag` metric. High `replica_lag` means that the replica can't apply replication changes fast enough. The total lag can be observed via `replica_byte_lag` metric, which has labels to indicate further details. These metrics are described in the [Monitor replication lag](#metrics) section below.\n\nOptimize queries and schema\n---------------------------\n\nThis section suggests some common query and schema optimizations you can make to\nimprove replication performance.\n\n### Long-running queries in the read replica\n\nLong-running queries in the replica might block replication for Cloud SQL.\nYou might want to have separate replicas for online transaction processing\n(OLTP) and online analytical processing (OLAP) purposes and only send\nlong-running queries to the OLAP replica.\n\nConsider adjusting the `max_standby_archive_delay` and\n`max_standby_streaming_delay` flags for your replica.\n\nIf you suspect VACUUM is the culprit, and query cancellation is not acceptable,\nconsider setting the `hot_standby_feedback` flag in the replica.\n\nReview [PostgreSQL documentation](https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT) for more information.\n\n### Exclusive locks due to DDL\n\nData definition language (DDL) commands, such as `ALTER TABLE` and\n`CREATE INDEX`, can cause replication lag in the replica due to\nexclusive locks. To avoid lock contention, consider scheduling DDL execution\nduring times when the query load is lower on the replicas.\nReview [PostgreSQL documentation](https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT) for more information.\n\n### Overloaded replica\n\nIf a read replica is receiving too many queries, replication could be blocked.\nConsider splitting the reads among multiple replicas to reduce the load on each\none.\n\nTo avoid query spikes, consider throttling replica read queries in your\napplication logic or in a proxy layer if you use one.\n\nIf there are spikes of activity on the primary instance, consider spreading out\nupdates.\n\n### Monolithic primary database\n\nConsider sharding the primary database vertically (or horizontally) to prevent\none or more lagging tables from holding back all the other tables.\n\nMonitor replication lag\n-----------------------\n\nYou can use the `replica_lag` and `network_lag` metrics to monitor replication\nlag and identify whether the cause of the lag is in the primary database,\nthe network, or the replica.\n\nVerify replication\n------------------\n\nTo verify that replication is working, run the following statement against the replica:\n\n\u003cbr /\u003e\n\n select status, last_msg_receipt_time from pg_stat_wal_receiver;\n\nIf replication is happening, you see the status `streaming` and a recent\nlast_msg_receipt_time: \n\n postgres=\u003e select status, last_msg_receipt_time from pg_stat_wal_receiver;\n status | last_msg_receipt_time\n -----------+-------------------------------\n streaming | 2020-01-21 20:19:51.461535+00\n (1 row)\n\nIf replication is not happening, an empty result is returned: \n\n postgres=\u003e select status, last_msg_receipt_time from pg_stat_wal_receiver;\n status | last_msg_receipt_time\n --------+-----------------------\n (0 rows)\n\nWhat's next:\n------------\n\n- [Promote replicas for regional migration or disaster recovery](/sql/docs/postgres/replication/cross-region-replicas)"]]