About replication status in PostgreSQL logical subscription

After opening logical subscription, we need to know the status of replication. This can be obtained from several system tables or views in PG
Subscribers
Publisher
How to determine whether the subscription has been synchronized?
Attention points
Reference resources

After opening logical subscription, we need to know the status of replication. This can be obtained from several system tables or views in PG.

Subscribers

pg_subscription_rel

You can know the synchronization status of each table through PG subscription rel

postgres=# select * from pg_subscription_rel; srsubid | srrelid | srsubstate | srsublsn ---------+---------+------------+----------- 18465 | 18446 | r | 0/453EF50 18465 | 18453 | r | 0/453EF88 18465 | 18459 | r | 0/453EFC0 (3 rows)
  • srsubstate
    Status code: i = initialization, d = copying data, s = synchronized, r = ready (normal replication)
  • srsublsn
    The end LSN of the source in the s and r states.

At the beginning, PG copies the base table from the publisher. At this time, the table is in the i and d States. After the base table is copied, record the LSN location to srsublsn.
Then it enters s or r state, and pulls and applies incremental data from the publisher through pgoutput logic decoding.

pg_stat_subscription

PG? Stat? Subscription displays the status of each subscription worker. A subscription contains an application worker, optional
There are also one or more sync worker s for initial synchronization.
The relid on the sync worker indicates the table being initially synchronized; for the apply worker, the relid is NULL.

The latest ﹣ end ﹣ LSN of the apply worker reflects the LSN position where synchronization has been completed, that is, the end position of the last WAL record applied (or skipped).

postgres=# select * from pg_stat_subscription; subid | subname | pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time -------+---------+-------+-------+--------------+-------------------------------+-------------------------------+----------------+------ ------------------------- 18515 | sub1 | 19860 | 18446 | | 2020-04-24 19:29:10.961417+08 | 2020-04-24 19:29:10.961417+08 | | 2020- 04-24 19:29:10.961417+08 18515 | sub1 | 19499 | | 0/4566B50 | 2020-04-24 19:29:05.946996+08 | 2020-04-24 19:29:05.947017+08 | 0/4566B50 | 2020- 04-24 19:29:05.946996+08 (2 rows)

pg_replication_origin_status

PG replication origin status contains the last location to incrementally synchronize from the replication source

postgres=# select * from pg_replication_origin_status; local_id | external_id | remote_lsn | local_lsn ----------+-------------+------------+----------- 1 | pg_18465 | 0/4540208 | 0/470FFD8 (1 row)

The remote [LSN] above is where the last WAL record of the subscriber application is located at the start LSN of the source node (that is, the beginning of executing this WAL record).
If other WAL records (such as those generated by updating other tables or background checkpoint s) are generated on the source node later, they will not be reflected in PG replication origin status.

Publisher

pg_replication_slots

The PG replication slots of the publisher reflect the LSN sites of the logical subscription replication slots.

postgres=# select * from pg_replication_slots; -[ RECORD 1 ]-------+---------- slot_name | sub1 plugin | pgoutput slot_type | logical datoid | 13451 database | postgres temporary | f active | t active_pid | 14058 xmin | catalog_xmin | 755 restart_lsn | 0/4540818 confirmed_flush_lsn | 0/4540850
  • restart_lsn
    Restart? LSN is the oldest WAL address (LSN) that may still be requested by consumers of this slot, and therefore will not be automatically removed during checkpoint.
  • confirmed_flush_lsn
    Confirmed ABCD flush ABCD LSN represents the address (LSN) where the consumer of the logical slot has confirmed to receive the data. Data older than this address is no longer available.

Confirmed ABCD flush ABCD LSN is the end of the last synchronized WAL record (byte alignment is required, actually the start of the next WAL).
restart_lsn is sometimes the starting point of the last synchronized WAL record.

Confirmed ﹣ flush ﹣ LSN is equivalent to the latest ﹣ end ﹣ LSN of PG ﹣ stat ﹣ subscription of subscriber.

The update WAL record of the table in the corresponding subscription range must be completed by the subscriber before it can be regarded as synchronized; for other unrelated wals, it is directly considered as synchronized, and the next WAL continues to be processed.

In the following example, we lock a subscription table at the subscriber end, which makes the subscriber unable to apply this INSERT WAL. All confirmed_flush_lsn are suspended in front of this WAL (0 / 4540850).

[postgres@sndsdevdb18 citus]$ pg_waldump worker1/pg_wal/000000010000000000000004 -s 0/045407A8 -n 5 rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn: 0/045407A8, prev 0/04540770, desc: CHECKPOINT_ONLINE redo 0/4540770; tli 1; prev tli 1; fpw true; xid 0:755; oid 24923; multi 1; offset 0; oldest xid 548 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 555/754; oldest running xid 755; online rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/04540818, prev 0/045407A8, desc: RUNNING_XACTS nextXid 755 latestCompletedXid 754 oldestRunningXid 755 rmgr: Heap len (rec/tot): 69/ 130, tx: 755, lsn: 0/04540850, prev 0/04540818, desc: INSERT off 2, blkref #0: rel 1663/13451/17988 blk 0 FPW rmgr: Transaction len (rec/tot): 46/ 46, tx: 755, lsn: 0/045408D8, prev 0/04540850, desc: COMMIT 2020-04-24 14:22:20.531476 CST rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/04540908, prev 0/045408D8, desc: RUNNING_XACTS nextXid 756 latestCompletedXid 755 oldestRunningXid 756

pg_stat_replication

For a logical subscription, the replication status of the application worker can be seen in PG stat replication. The values of write LSN, flush LSN, replay LSN and PG replication slots are the same. The application name of the replication of the application worker is the subscription name.

postgres=# select * from pg_stat_replication ; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | bac kend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state -------+----------+----------+-----------------------+-------------+-----------------+-------------+-------------------------------+---- ----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+----------- - 19861 | 10 | postgres | sub1_18515_sync_18446 | | | -1 | 2020-04-24 19:29:10.964055+08 | | startup | | | | | | | | 0 | async 19500 | 10 | postgres | sub1 | | | -1 | 2020-04-24 19:26:59.950652+08 | | streaming | 0/4566B50 | 0/4566B50 | 0/4566B50 | 0/4566B50 | | | | 0 | async (2 rows)

Optionally, you may also see replication that sync worker temporarily creates for initial synchronization. The application name of the replication of sync worker is the subscription name plus the synchronization table information.

What is it for understanding the replication of sync worker? We need to first look at the processing logic of sync worker.

When sync worker initially synchronizes a table, there are several steps as follows

  1. Create temporary replication slot for replication of sync worker
  2. copy table data from source to destination
  3. Record the lsn at copy completion to the srsublsn at PG subscription rel
  4. Compare srsublsn and apply worker's current synchronization point lsn(latest_end_lsn)

    4.1 if srsublsn is less than latest ﹐ end ﹐ LSN, change the synchronization state to s 4.2 if srsublsn is greater than latest ﹣ end ﹣ LSN, pull the incremental data of this table through the replication slot of 1. After catching up with the application worker, change the synchronization status to s
  5. The subsequent incremental synchronization work is handed over to the application worker

How to determine whether the subscription has been synchronized?

When all tables are in the s or r state, as long as the latest ﹣ end ﹣ lsn of the subscriber catches up with the current lsn of the publisher.

If we switch data tables through logical subscription, we can perform the following steps to ensure data synchronization

  1. Create subscription and wait for all tables to complete basic synchronization

    That is, all tables are in s or r state in 'PG subscription rel'
  2. Disable update of lock table at publisher
  3. Get the current lsn of the publisher
  4. Get the latest ﹣ end ﹣ lsn (or other equivalent indicator of the publisher) of the subscriber. If more than 3 LSNS are obtained, the data is synchronized.
  5. Repeat 4 if not already synchronized

Attention points

The following synchronization location information reflects the synchronization sites of tables already in the s or r state.

  • pg_replication_slots
  • pg_stat_replication
  • pg_replication_origin_status

For the tables that have not completed the initial synchronization, after the subscriber copies the initial data, it will pull the incremental WAL with a temporary replication slot until it catches up with the application worker.
After catching up, the synchronization status is modified to s, and the subsequent incremental synchronization is handed over to the application worker.
Therefore, when we judge the LSN location of the overall replication of the subscription, we must wait for all tables to complete the initial synchronization before it makes sense.

Reference resources

Detailed reference:
https://github.com/ChenHuajun/chenhuajun.github.io/blob/master/_posts/2018-07-30-PostgreSQL Logical subscription process resolution.md

8 July 2021, 13:32 | Views: 167

Add new comment

For adding a comment, please log in
or create account

0 comments