In the high availability architecture of postgresql, the stream replication mechanism is usually used to realize the active and standby, edb Provides a performance impact reference:
It can be seen from the above that the performance of HA mode will decrease by about 10% - 30%_ Write has little performance impact because it only writes to standby memory. On completely depends on the performance of the opposite disk. In essence, it is WAL serial write twice + network once. At present, if the primary node does not run to the bottleneck, such as cpu is not full, IO is full, and the network does not reach the bottleneck, it is likely that the TPS of HA is the same as that of a single instance. Ha will be a problem only when the network becomes a bottleneck, such as when batch running needs to be synchronized.
The author's environment uses ltbench, which can run very stably to 70000 tps, with a response time of 1.5 Ms. it is stored by PMEM, and local and remote under Melos roce_ The difference between apply and on is within 10%, and the cpu utilization of the primary node is about 80%.
In order to achieve automatic failover, repmgr (of course, it is also used) Patroni (written in python) and pg_auto_failover), we recommend repmgr for failover management and kept as transparent failover for the client.
The repmgr command is mainly as follows:
repmgr primary register — initialise a repmgr installation and register the primary node
repmgr primary unregister — unregister an inactive primary node
repmgr standby clone — clone a PostgreSQL standby node from another PostgreSQL node
repmgr standby register — add a standby's information to the repmgr metadata
repmgr standby unregister — remove a standby's information from the repmgr metadata
repmgr standby promote — promote a standby to a primary
repmgr standby follow — attach a running standby to a new upstream node
repmgr standby switchover — promote a standby to primary and demote the existing primary to a standby
repmgr witness register — add a witness node's information to the repmgr metadata
repmgr witness unregister — remove a witness node's information to the repmgr metadata
===========node, cluster and service are mainly used for daily management. The above three categories are mainly used for initial installation and maintenance.
repmgr node status — show overview of a node's basic information and replication status
repmgr node check — performs some health checks on a node from a replication perspective
repmgr node rejoin — rejoin a dormant (stopped) node to the replication cluster
repmgr node service — show or execute the system service command to stop/start/restart/reload/promote a node
repmgr cluster show — display information about each registered node in the replication cluster
repmgr cluster matrix — runs repmgr cluster show on each node and summarizes output
repmgr cluster crosscheck — cross-checks connections between each combination of nodes
repmgr cluster event — output a formatted list of cluster events
repmgr cluster cleanup — purge monitoring history
repmgr service status — display information about the status of repmgrd on each node in the cluster
repmgr service pause — Instruct all repmgrd instances in the replication cluster to pause failover operations
repmgr service unpause — Instruct all repmgrd instances in the replication cluster to resume failover operations
repmgr daemon start — Start the repmgrd daemon on the local node
repmgr daemon stop — Stop the repmgrd daemon on the local node
More importantly:
- View the roles and status of cluster nodes. (executed by any node, including the information of all nodes)
repmgr cluster show -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+----------------------------+---------+-----------+---------------------------+----------+----------+----------+------------------------------------------------------------------------ 1 | 10.19.36.10-defaultcluster | standby | running | 10.19.36.9-defaultcluster | default | 100 | 5 | host=10.19.36.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2 2 | 10.19.36.9-defaultcluster | primary | * running | | default | 100 | 5 | host=10.19.36.9 port=5432 user=repmgr dbname=repmgr connect_timeout=2
- View the status of the repmgrd daemon. (executed by any node, including the information of all nodes)
[lightdb@localhost ~]$ repmgr service status -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+----------------------------+---------+-----------+---------------------------+---------+---------+---------+-------------------- 1 | 10.19.36.10-defaultcluster | standby | running | 10.19.36.9-defaultcluster | running | 3324044 | no | 0 second(s) ago 2 | 10.19.36.9-defaultcluster | primary | * running | | running | 4170296 | no | n/a
- --csv can generate csv format for automatic analysis, as follows:
[lightdb@localhost ~]$ repmgr service status -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf --csv 1,10.19.36.10-defaultcluster,standby,1,1,3324044,0,100,1,default 2,10.19.36.9-defaultcluster,primary,1,1,4170296,0,100,-1,default
- View the role, status and replication delay of this node (basically status is enough) (only the information of this node is included)
[lightdb@localhost ~]$ repmgr node status -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf Node "10.19.36.10-defaultcluster": LightDB version: 13.3 Total data size: 9086 MB Conninfo: host=10.19.36.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2 Role: standby WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective) # Unless it is set to always, it is always used only in WAL replay. Archive command: test ! -f /home/lightdb/lightdb-x/13.3-21.2/archive/%f && cp %p /home/lightdb/lightdb-x/13.3-21.2/archive/%f && find /home/lightdb/lightdb-x/13.3-21.2/archive -type f -mmin +10080 | xargs -i rm {} WALs pending archiving: 2 pending files Replication connections: 0 (of maximal 10) Replication slots: 0 physical (of maximal 10; 0 missing) Upstream node: 10.19.36.9-defaultcluster (ID: 2) Replication lag: 55 seconds # The first time is not accurate, it may be higher Last received LSN: 6/F641CCE8 Last replayed LSN: 6/F64159D0
If you are not sure, you can check. Check does a real-time check. In terms of monitoring, to be conservative, both are executed (only the information of this node is included).
[lightdb@localhost ~]$ repmgr node check -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf Node "10.19.36.10-defaultcluster": Server role: OK (node is standby) Replication lag: OK (0 seconds) WAL archiving: OK (2 pending archive ready files) Upstream connection: OK (node "10.19.36.10-defaultcluster" (ID: 1) is attached to expected upstream node "10.19.36.9-defaultcluster" (ID: 2)) Downstream servers: OK (this node has no downstream nodes) Replication slots: OK (node has no physical replication slots) Missing physical replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured "data_directory" is "/home/lightdb/data/defaultCluster")
View the connectivity between cluster nodes (executed by any node, including the information of all nodes).
[lightdb@localhost ~]$ repmgr cluster crosscheck -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf INFO: connecting to database Name | ID | 1 | 2 ----------------------------+----+---+--- 10.19.36.10-defaultcluster | 1 | * | * 10.19.36.9-defaultcluster | 2 | * | *
- Monitor the slave node for exceptions (unless the synchronization mode selected by the master node is local. If it is on, remote_apply or remote_write, the slave node must be monitored to prevent the master node from stopping working when the slave node is abnormal, thus affecting production).
============================The above is mainly monitoring. Let's take a look at daily management.
- Node maintenance restart.
[lightdb@lightdb1 ~]$ repmgr service pause -f /mnt/pmem1/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf NOTICE: node 1 (10.20.137.41-defaultcluster) paused NOTICE: node 2 (10.20.137.42-defaultcluster) paused [lightdb@lightdb1 ~]$ repmgr service status -f /mnt/pmem1/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-----------------------------+---------+-----------+-----------------------------+---------+--------+---------+-------------------- 1 | 10.20.137.41-defaultcluster | primary | * running | | running | 38834 | yes | n/a 2 | 10.20.137.42-defaultcluster | standby | running | 10.20.137.41-defaultcluster | running | 185064 | yes | 1 second(s) ago . . . . Modify the parameters and restart the instance....
During this period, the repmgr log of standby will keep trying to retry, as follows:
[2021-10-30 18:22:14] [INFO] node "10.20.137.42-defaultcluster" (ID: 2) monitoring upstream node "10.20.137.41-defaultcluster" (ID: 1) in normal state [2021-10-30 18:22:14] [DETAIL] last monitoring statistics update was 2 seconds ago [2021-10-30 18:23:12] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2" [2021-10-30 18:23:12] [DETAIL] PQping() returned "PQPING_REJECT" [2021-10-30 18:23:12] [WARNING] unable to connect to upstream node "10.20.137.41-defaultcluster" (ID: 1) [2021-10-30 18:23:12] [INFO] checking state of node "10.20.137.41-defaultcluster" (ID: 1), 1 of 3 attempts [2021-10-30 18:23:12] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=10.20.137.41 port=5432 fallback_application_name=repmgr" [2021-10-30 18:23:12] [DETAIL] PQping() returned "PQPING_REJECT" [2021-10-30 18:23:12] [INFO] sleeping up to 5 seconds until next reconnection attempt [2021-10-30 18:23:17] [INFO] checking state of node "10.20.137.41-defaultcluster" (ID: 1), 2 of 3 attempts [2021-10-30 18:23:17] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=10.20.137.41 port=5432 fallback_application_name=repmgr" [2021-10-30 18:23:17] [DETAIL] PQping() returned "PQPING_REJECT" [2021-10-30 18:23:17] [INFO] sleeping up to 5 seconds until next reconnection attempt [2021-10-30 18:23:22] [INFO] checking state of node "10.20.137.41-defaultcluster" (ID: 1), 3 of 3 attempts [2021-10-30 18:23:22] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=10.20.137.41 port=5432 fallback_application_name=repmgr" [2021-10-30 18:23:22] [DETAIL] PQping() returned "PQPING_REJECT" [2021-10-30 18:23:22] [WARNING] unable to reconnect to node "10.20.137.41-defaultcluster" (ID: 1) after 3 attempts [2021-10-30 18:23:22] [NOTICE] repmgrd on this node is paused [2021-10-30 18:23:22] [DETAIL] no failover will be carried out [2021-10-30 18:23:22] [HINT] execute "repmgr service unpause" to resume normal failover mode [2021-10-30 18:23:22] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2" [2021-10-30 18:23:22] [DETAIL] PQping() returned "PQPING_REJECT" [2021-10-30 18:23:22] [ERROR] unable to execute get_primary_current_lsn() [2021-10-30 18:23:22] [DETAIL] FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. [2021-10-30 18:23:22] [WARNING] unable to retrieve primary's current LSN [2021-10-30 18:23:24] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2" [2021-10-30 18:23:24] [DETAIL] PQping() returned "PQPING_REJECT" [2021-10-30 18:23:24] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2" [2021-10-30 18:23:24] [DETAIL] PQping() returned "PQPING_REJECT" [2021-10-30 18:23:26] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2" [2021-10-30 18:23:26] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2021-10-30 18:23:26] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2" [2021-10-30 18:23:26] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2021-10-30 18:23:28] [NOTICE] upstream is available but upstream connection has gone away, resetting [2021-10-30 18:23:28] [NOTICE] reconnected to upstream node "10.20.137.41-defaultcluster" (ID: 1) after 6 seconds, resuming monitoring [2021-10-30 18:27:16] [INFO] node "10.20.137.42-defaultcluster" (ID: 2) monitoring upstream node "10.20.137.41-defaultcluster" (ID: 1) in normal state [2021-10-30 18:27:16] [DETAIL] last monitoring statistics update was 2 seconds ago
pg log is as follows:
2021-10-30 18:12:30.114559T @ checkpointer 00000[2021-10-29 20:45:28 CST] 0 [115395] DETAIL: Last completed transaction was at log time 2021-10-30 18:12:30.084333+08. 2021-10-30 18:14:41.898079T @ walreceiver 00000[2021-10-30 17:58:15 CST] 0 [144662] LOG: replication terminated by primary server 2021-10-30 18:14:41.898079T @ walreceiver 00000[2021-10-30 17:58:15 CST] 0 [144662] DETAIL: End of WAL reached on timeline 3 at 10/800000A0. 2021-10-30 18:14:41.898109T @ walreceiver XX000[2021-10-30 17:58:15 CST] 0 [144662] FATAL: could not send end-of-streaming message to primary: no COPY in progress 2021-10-30 18:14:41.898250T @ startup 00000[2021-10-29 20:45:28 CST] 0 [115394] LOG: invalid record length at 10/800000A0: wanted 24, got 0 2021-10-30 18:14:41.909281T @ walreceiver XX000[2021-10-30 18:14:41 CST] 0 [158899] FATAL: could not connect to the primary server: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2021-10-30 18:14:46.909030T @ walreceiver 00000[2021-10-30 18:14:46 CST] 0 [158962] LOG: started streaming WAL from primary at 10/80000000 on timeline 3 2021-10-30 18:15:30.175149T @ checkpointer 00000[2021-10-29 20:45:28 CST] 0 [115395] LOG: restartpoint starting: time
- The repmgrd daemon starts and stops.
- Active / standby switching.
- Failure recovery. After the primary node is repaired, it is good to rejoin normally. Rejoins often fail. As follows:
[lightdb@hs-10-19-36-9 log]$ repmgr node rejoin -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf -d 'host=10.19.36.10 dbname=repmgr user=repmgr' ERROR: this node cannot attach to rejoin target node 1 DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 8/C00000A0 HINT: use --force-rewind to execute lt_rewind [lightdb@hs-10-19-36-9 log]$ repmgr node rejoin -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf -d 'host=10.19.36.10 dbname=repmgr user=repmgr' --force-rewind NOTICE: lt_rewind execution required for this node to attach to rejoin target node 1 DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 8/C00000A0 NOTICE: executing lt_rewind DETAIL: lt_rewind command is "/home/lightdb/lightdb-x/13.3-21.2/bin/lt_rewind -D '/home/lightdb/data/defaultCluster' --source-server='host=10.19.36.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2'" ERROR: lt_rewind execution failed DETAIL: lt_rewind: servers diverged at WAL location 8/A0000000 on timeline 5 lt_rewind: error: could not open file "/home/lightdb/data/defaultCluster/pg_wal/000000050000000800000004": There is no such file or directory lt_rewind: fatal: could not find previous WAL record at 8/8DF84C18
Note: to add, Pg's timeline is exactly the same as oracle's resetlogs. Equivalent to reinitialized instances. The first eight bits of the wal file are the timeline id. Every time a point in time recovery or pg_rewind, promote, and timeline id will all be incremented by 1, and a NewTimelineID.history file will be generated. So the first history file is 00000002.history. stay Point in time recovery (controlled by the parameters recovery_target_timeline and recovery_target_lsn), the hisotry file will record the set recovery point and the actual recovery point. As follows:
cat 00000002.history 1 0/70000D8 after LSN 0/7000060 The meaning is as follows: 1<parentTLI> 0/70000D8 <switchpoint> after LSN 0/7000060<reason> parentTLI: ID of the parent timeline switchpoint: XLogRecPtr of the WAL location where the switch happened reason : human-readable explanation of why the timeline was changed
So it may be the above after multiple time point recoveries.
Or as follows:
View pg_wal, you can find that the oldest file has started from 00004.
[lightdb@hs-10-19-36-9 defaultCluster]$ cd /home/lightdb/data/defaultCluster/pg_wal [lightdb@hs-10-19-36-9 pg_wal]$ ll Total dosage 13107224 -rw------- 1 lightdb lightdb 333 10 November 29:19 000000010000000000000002.00000028.backup -rw------- 1 lightdb lightdb 42 10 December 29:13 00000002.history -rw------- 1 lightdb lightdb 85 10 June 29-14:02 00000003.history -rw------- 1 lightdb lightdb 128 10 September 29-15:49 00000004.history -rw------- 1 lightdb lightdb 536870912 10 September 29-23:12 000000050000000800000005 -rw------- 1 lightdb lightdb 536870912 10 September 29-23:12 000000050000000800000006 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:39 000000050000000800000007 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:42 000000050000000900000000 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:44 000000050000000900000001 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:46 000000050000000900000002 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:47 000000050000000900000003 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:54 000000050000000900000004 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:49 000000050000000900000005 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:52 000000050000000900000006 -rw------- 1 lightdb lightdb 536870912 10 September 29-23:03 000000050000000900000007 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:39 000000050000000A00000000 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:39 000000050000000A00000001 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:40 000000050000000A00000002 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:41 000000050000000A00000003 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:43 000000050000000A00000004 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:46 000000050000000A00000005 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:48 000000050000000A00000006 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:51 000000050000000A00000007 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:40 000000050000000B00000000 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:41 000000050000000B00000001 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:43 000000050000000B00000002 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:45 000000050000000B00000003 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:50 000000050000000B00000004 -rw------- 1 lightdb lightdb 536870912 10 September 29-22:53 000000050000000B00000005 -rw------- 1 lightdb lightdb 171 10 June 29-16:04 00000005.history -rw------- 1 lightdb lightdb 214 10 September 29-23:12 00000006.history drwx------ 2 lightdb lightdb 242 10 September 29-23:12 archive_status
The naming format of PS: wal file is: 3-part 8-bit hexadecimal, starting from 00000001 00000000 00000001, such as 00000001 (timelineid) 0000000c (LSN / (0x100000000 / wal_segsz_bytes)) 000000ce (LSN% (0x100000000 / wal_segsz_bytes)). Therefore, the generation of wal file names is systematic. The third part may be different according to the size of wal.
Look at the contents of each history file.
[lightdb@hs-10-19-36-9 pg_wal]$ tail -fn 100 00000005.history 1 0/800000A0 no recovery target specified 2 0/A00000A0 no recovery target specified 3 4/C00000A0 no recovery target specified 4 4/E00000A0 no recovery target specified ^C [lightdb@hs-10-19-36-9 pg_wal]$ cat 00000005.history 1 0/800000A0 no recovery target specified 2 0/A00000A0 no recovery target specified 3 4/C00000A0 no recovery target specified 4 4/E00000A0 no recovery target specified [lightdb@hs-10-19-36-9 pg_wal]$ cat 00000004.history 1 0/800000A0 no recovery target specified 2 0/A00000A0 no recovery target specified 3 4/C00000A0 no recovery target specified [lightdb@hs-10-19-36-9 pg_wal]$ cat 00000003.history 1 0/800000A0 no recovery target specified 2 0/A00000A0 no recovery target specified [lightdb@hs-10-19-36-9 pg_wal]$ cat 00000006.history 1 0/800000A0 no recovery target specified 2 0/A00000A0 no recovery target specified 3 4/C00000A0 no recovery target specified 4 4/E00000A0 no recovery target specified 5 8/A0000000 no recovery target specified