Daily management commands and implementation of postgresql highly available repmgr components

In the high availability architecture of postgresql, the stream replication mechanism is usually used to realize the active and standby, edb Provides a performance impact reference:

  

   

  It can be seen from the above that the performance of HA mode will decrease by about 10% - 30%_ Write has little performance impact because it only writes to standby memory. On completely depends on the performance of the opposite disk. In essence, it is WAL serial write twice + network once. At present, if the primary node does not run to the bottleneck, such as cpu is not full, IO is full, and the network does not reach the bottleneck, it is likely that the TPS of HA is the same as that of a single instance. Ha will be a problem only when the network becomes a bottleneck, such as when batch running needs to be synchronized.

The author's environment uses ltbench, which can run very stably to 70000 tps, with a response time of 1.5 Ms. it is stored by PMEM, and local and remote under Melos roce_ The difference between apply and on is within 10%, and the cpu utilization of the primary node is about 80%.

In order to achieve automatic failover, repmgr (of course, it is also used) Patroni (written in python) and pg_auto_failover), we recommend repmgr for failover management and kept as transparent failover for the client.

The repmgr command is mainly as follows:

repmgr primary register — initialise a repmgr installation and register the primary node

repmgr primary unregister — unregister an inactive primary node

repmgr standby clone — clone a PostgreSQL standby node from another PostgreSQL node

repmgr standby register — add a standby's information to the repmgr metadata

repmgr standby unregister — remove a standby's information from the repmgr metadata

repmgr standby promote — promote a standby to a primary

repmgr standby follow — attach a running standby to a new upstream node

repmgr standby switchover — promote a standby to primary and demote the existing primary to a standby

repmgr witness register — add a witness node's information to the repmgr metadata

repmgr witness unregister — remove a witness node's information to the repmgr metadata

===========node, cluster and service are mainly used for daily management. The above three categories are mainly used for initial installation and maintenance.

repmgr node status — show overview of a node's basic information and replication status

repmgr node check — performs some health checks on a node from a replication perspective

repmgr node rejoin — rejoin a dormant (stopped) node to the replication cluster

repmgr node service — show or execute the system service command to stop/start/restart/reload/promote a node

repmgr cluster show — display information about each registered node in the replication cluster

repmgr cluster matrix — runs repmgr cluster show on each node and summarizes output

repmgr cluster crosscheck — cross-checks connections between each combination of nodes

repmgr cluster event — output a formatted list of cluster events

repmgr cluster cleanup — purge monitoring history

repmgr service status — display information about the status of repmgrd on each node in the cluster

repmgr service pause — Instruct all repmgrd instances in the replication cluster to pause failover operations

repmgr service unpause — Instruct all repmgrd instances in the replication cluster to resume failover operations

repmgr daemon start — Start the repmgrd daemon on the local node

repmgr daemon stop — Stop the repmgrd daemon on the local node

More importantly:

  • View the roles and status of cluster nodes. (executed by any node, including the information of all nodes)
repmgr cluster show -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
 ID | Name                       | Role    | Status    | Upstream                  | Location | Priority | Timeline | Connection string                                                     
----+----------------------------+---------+-----------+---------------------------+----------+----------+----------+------------------------------------------------------------------------
 1  | 10.19.36.10-defaultcluster | standby |   running | 10.19.36.9-defaultcluster | default  | 100      | 5        | host=10.19.36.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 2  | 10.19.36.9-defaultcluster  | primary | * running |                           | default  | 100      | 5        | host=10.19.36.9 port=5432 user=repmgr dbname=repmgr connect_timeout=2 
  • View the status of the repmgrd daemon. (executed by any node, including the information of all nodes)
[lightdb@localhost ~]$ repmgr service status -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
 ID | Name                       | Role    | Status    | Upstream                  | repmgrd | PID     | Paused? | Upstream last seen
----+----------------------------+---------+-----------+---------------------------+---------+---------+---------+--------------------
 1  | 10.19.36.10-defaultcluster | standby |   running | 10.19.36.9-defaultcluster | running | 3324044 | no      | 0 second(s) ago    
 2  | 10.19.36.9-defaultcluster  | primary | * running |                           | running | 4170296 | no      | n/a            
  • --csv can generate csv format for automatic analysis, as follows:
[lightdb@localhost ~]$ repmgr service status -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf --csv
1,10.19.36.10-defaultcluster,standby,1,1,3324044,0,100,1,default
2,10.19.36.9-defaultcluster,primary,1,1,4170296,0,100,-1,default
  • View the role, status and replication delay of this node (basically status is enough) (only the information of this node is included)
[lightdb@localhost ~]$ repmgr node status -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
Node "10.19.36.10-defaultcluster":
    LightDB version: 13.3
    Total data size: 9086 MB
    Conninfo: host=10.19.36.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2
    Role: standby
    WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective)   # Unless it is set to always, it is always used only in WAL replay.
    Archive command: test ! -f /home/lightdb/lightdb-x/13.3-21.2/archive/%f && cp %p /home/lightdb/lightdb-x/13.3-21.2/archive/%f && find /home/lightdb/lightdb-x/13.3-21.2/archive -type f -mmin +10080 | xargs -i rm {}
    WALs pending archiving: 2 pending files
    Replication connections: 0 (of maximal 10)
    Replication slots: 0 physical (of maximal 10; 0 missing)
    Upstream node: 10.19.36.9-defaultcluster (ID: 2)
    Replication lag: 55 seconds     # The first time is not accurate, it may be higher
    Last received LSN: 6/F641CCE8
    Last replayed LSN: 6/F64159D0

If you are not sure, you can check. Check does a real-time check. In terms of monitoring, to be conservative, both are executed (only the information of this node is included).

[lightdb@localhost ~]$ repmgr node check -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
Node "10.19.36.10-defaultcluster":
    Server role: OK (node is standby)
    Replication lag: OK (0 seconds)
    WAL archiving: OK (2 pending archive ready files)
    Upstream connection: OK (node "10.19.36.10-defaultcluster" (ID: 1) is attached to expected upstream node "10.19.36.9-defaultcluster" (ID: 2))
    Downstream servers: OK (this node has no downstream nodes)
    Replication slots: OK (node has no physical replication slots)
    Missing physical replication slots: OK (node has no missing physical replication slots)
    Configured data directory: OK (configured "data_directory" is "/home/lightdb/data/defaultCluster")

View the connectivity between cluster nodes (executed by any node, including the information of all nodes).

[lightdb@localhost ~]$ repmgr cluster crosscheck -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
INFO: connecting to database
 Name                       | ID | 1 | 2
----------------------------+----+---+---
 10.19.36.10-defaultcluster | 1  | * | * 
 10.19.36.9-defaultcluster  | 2  | * | * 
  • Monitor the slave node for exceptions (unless the synchronization mode selected by the master node is local. If it is on, remote_apply or remote_write, the slave node must be monitored to prevent the master node from stopping working when the slave node is abnormal, thus affecting production).

============================The above is mainly monitoring. Let's take a look at daily management.

  • Node maintenance restart.
[lightdb@lightdb1 ~]$ repmgr service pause -f /mnt/pmem1/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
NOTICE: node 1 (10.20.137.41-defaultcluster) paused
NOTICE: node 2 (10.20.137.42-defaultcluster) paused
[lightdb@lightdb1 ~]$ repmgr service status -f /mnt/pmem1/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf
 ID | Name                        | Role    | Status    | Upstream                    | repmgrd | PID    | Paused? | Upstream last seen
----+-----------------------------+---------+-----------+-----------------------------+---------+--------+---------+--------------------
 1  | 10.20.137.41-defaultcluster | primary | * running |                             | running | 38834  | yes     | n/a                
 2  | 10.20.137.42-defaultcluster | standby |   running | 10.20.137.41-defaultcluster | running | 185064 | yes     | 1 second(s) ago    

. . . . Modify the parameters and restart the instance....

During this period, the repmgr log of standby will keep trying to retry, as follows:

[2021-10-30 18:22:14] [INFO] node "10.20.137.42-defaultcluster" (ID: 2) monitoring upstream node "10.20.137.41-defaultcluster" (ID: 1) in normal state
[2021-10-30 18:22:14] [DETAIL] last monitoring statistics update was 2 seconds ago
[2021-10-30 18:23:12] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:12] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:12] [WARNING] unable to connect to upstream node "10.20.137.41-defaultcluster" (ID: 1)
[2021-10-30 18:23:12] [INFO] checking state of node "10.20.137.41-defaultcluster" (ID: 1), 1 of 3 attempts
[2021-10-30 18:23:12] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=10.20.137.41 port=5432 fallback_application_name=repmgr"
[2021-10-30 18:23:12] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:12] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2021-10-30 18:23:17] [INFO] checking state of node "10.20.137.41-defaultcluster" (ID: 1), 2 of 3 attempts
[2021-10-30 18:23:17] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=10.20.137.41 port=5432 fallback_application_name=repmgr"
[2021-10-30 18:23:17] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:17] [INFO] sleeping up to 5 seconds until next reconnection attempt
[2021-10-30 18:23:22] [INFO] checking state of node "10.20.137.41-defaultcluster" (ID: 1), 3 of 3 attempts
[2021-10-30 18:23:22] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=10.20.137.41 port=5432 fallback_application_name=repmgr"
[2021-10-30 18:23:22] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:22] [WARNING] unable to reconnect to node "10.20.137.41-defaultcluster" (ID: 1) after 3 attempts
[2021-10-30 18:23:22] [NOTICE] repmgrd on this node is paused
[2021-10-30 18:23:22] [DETAIL] no failover will be carried out
[2021-10-30 18:23:22] [HINT] execute "repmgr service unpause" to resume normal failover mode
[2021-10-30 18:23:22] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:22] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:22] [ERROR] unable to execute get_primary_current_lsn()
[2021-10-30 18:23:22] [DETAIL] 
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

[2021-10-30 18:23:22] [WARNING] unable to retrieve primary's current LSN
[2021-10-30 18:23:24] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:24] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:24] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:24] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-10-30 18:23:26] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:26] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-30 18:23:26] [WARNING] unable to ping "host=10.20.137.41 port=5432 user=repmgr dbname=repmgr connect_timeout=2"
[2021-10-30 18:23:26] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-10-30 18:23:28] [NOTICE] upstream is available but upstream connection has gone away, resetting
[2021-10-30 18:23:28] [NOTICE] reconnected to upstream node "10.20.137.41-defaultcluster" (ID: 1) after 6 seconds, resuming monitoring
[2021-10-30 18:27:16] [INFO] node "10.20.137.42-defaultcluster" (ID: 2) monitoring upstream node "10.20.137.41-defaultcluster" (ID: 1) in normal state
[2021-10-30 18:27:16] [DETAIL] last monitoring statistics update was 2 seconds ago

pg log is as follows:

2021-10-30 18:12:30.114559T  @  checkpointer  00000[2021-10-29 20:45:28 CST] 0 [115395] DETAIL:  Last completed transaction was at log time 2021-10-30 18:12:30.084333+08.
2021-10-30 18:14:41.898079T  @  walreceiver  00000[2021-10-30 17:58:15 CST] 0 [144662] LOG:  replication terminated by primary server
2021-10-30 18:14:41.898079T  @  walreceiver  00000[2021-10-30 17:58:15 CST] 0 [144662] DETAIL:  End of WAL reached on timeline 3 at 10/800000A0.
2021-10-30 18:14:41.898109T  @  walreceiver  XX000[2021-10-30 17:58:15 CST] 0 [144662] FATAL:  could not send end-of-streaming message to primary: no COPY in progress
2021-10-30 18:14:41.898250T  @  startup  00000[2021-10-29 20:45:28 CST] 0 [115394] LOG:  invalid record length at 10/800000A0: wanted 24, got 0
2021-10-30 18:14:41.909281T  @  walreceiver  XX000[2021-10-30 18:14:41 CST] 0 [158899] FATAL:  could not connect to the primary server: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
2021-10-30 18:14:46.909030T  @  walreceiver  00000[2021-10-30 18:14:46 CST] 0 [158962] LOG:  started streaming WAL from primary at 10/80000000 on timeline 3
2021-10-30 18:15:30.175149T  @  checkpointer  00000[2021-10-29 20:45:28 CST] 0 [115395] LOG:  restartpoint starting: time

 

  • The repmgrd daemon starts and stops.
  • Active / standby switching.
  • Failure recovery. After the primary node is repaired, it is good to rejoin normally. Rejoins often fail. As follows:
[lightdb@hs-10-19-36-9 log]$ repmgr node rejoin -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf -d 'host=10.19.36.10 dbname=repmgr user=repmgr'
ERROR: this node cannot attach to rejoin target node 1
DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 8/C00000A0
HINT: use --force-rewind to execute lt_rewind
[lightdb@hs-10-19-36-9 log]$ repmgr node rejoin -f /home/lightdb/lightdb-x/13.3-21.2/etc/repmgr/repmgr.conf -d 'host=10.19.36.10 dbname=repmgr user=repmgr' --force-rewind
NOTICE: lt_rewind execution required for this node to attach to rejoin target node 1
DETAIL: rejoin target server's timeline 6 forked off current database system timeline 5 before current recovery point 8/C00000A0
NOTICE: executing lt_rewind
DETAIL: lt_rewind command is "/home/lightdb/lightdb-x/13.3-21.2/bin/lt_rewind -D '/home/lightdb/data/defaultCluster' --source-server='host=10.19.36.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2'"
ERROR: lt_rewind execution failed
DETAIL: lt_rewind: servers diverged at WAL location 8/A0000000 on timeline 5
lt_rewind: error: could not open file "/home/lightdb/data/defaultCluster/pg_wal/000000050000000800000004": There is no such file or directory
lt_rewind: fatal: could not find previous WAL record at 8/8DF84C18

Note: to add, Pg's timeline is exactly the same as oracle's resetlogs. Equivalent to reinitialized instances. The first eight bits of the wal file are the timeline id. Every time a point in time recovery or pg_rewind, promote, and timeline id will all be incremented by 1, and a NewTimelineID.history file will be generated. So the first history file is 00000002.history. stay Point in time recovery (controlled by the parameters recovery_target_timeline and recovery_target_lsn), the hisotry file will record the set recovery point and the actual recovery point. As follows:

cat 00000002.history
1 0/70000D8 after LSN 0/7000060

The meaning is as follows:
1<parentTLI>    0/70000D8 <switchpoint>     after LSN 0/7000060<reason>

parentTLI:     ID of the parent timeline
switchpoint:     XLogRecPtr of the WAL location where the switch happened
reason :     human-readable explanation of why the timeline was changed

 

So it may be the above after multiple time point recoveries.  

Or as follows:

  View pg_wal, you can find that the oldest file has started from 00004.

[lightdb@hs-10-19-36-9 defaultCluster]$ cd /home/lightdb/data/defaultCluster/pg_wal
[lightdb@hs-10-19-36-9 pg_wal]$ ll
 Total dosage 13107224
-rw------- 1 lightdb lightdb       333 10 November 29:19 000000010000000000000002.00000028.backup
-rw------- 1 lightdb lightdb        42 10 December 29:13 00000002.history
-rw------- 1 lightdb lightdb        85 10 June 29-14:02 00000003.history
-rw------- 1 lightdb lightdb       128 10 September 29-15:49 00000004.history
-rw------- 1 lightdb lightdb 536870912 10 September 29-23:12 000000050000000800000005
-rw------- 1 lightdb lightdb 536870912 10 September 29-23:12 000000050000000800000006
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:39 000000050000000800000007
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:42 000000050000000900000000
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:44 000000050000000900000001
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:46 000000050000000900000002
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:47 000000050000000900000003
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:54 000000050000000900000004
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:49 000000050000000900000005
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:52 000000050000000900000006
-rw------- 1 lightdb lightdb 536870912 10 September 29-23:03 000000050000000900000007
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:39 000000050000000A00000000
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:39 000000050000000A00000001
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:40 000000050000000A00000002
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:41 000000050000000A00000003
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:43 000000050000000A00000004
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:46 000000050000000A00000005
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:48 000000050000000A00000006
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:51 000000050000000A00000007
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:40 000000050000000B00000000
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:41 000000050000000B00000001
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:43 000000050000000B00000002
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:45 000000050000000B00000003
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:50 000000050000000B00000004
-rw------- 1 lightdb lightdb 536870912 10 September 29-22:53 000000050000000B00000005
-rw------- 1 lightdb lightdb       171 10 June 29-16:04 00000005.history
-rw------- 1 lightdb lightdb       214 10 September 29-23:12 00000006.history
drwx------ 2 lightdb lightdb       242 10 September 29-23:12 archive_status

The naming format of PS: wal file is: 3-part 8-bit hexadecimal, starting from 00000001 00000000 00000001, such as 00000001 (timelineid) 0000000c (LSN / (0x100000000 / wal_segsz_bytes)) 000000ce (LSN% (0x100000000 / wal_segsz_bytes)). Therefore, the generation of wal file names is systematic. The third part may be different according to the size of wal.

 

Look at the contents of each history file.

[lightdb@hs-10-19-36-9 pg_wal]$ tail -fn 100 00000005.history 
1    0/800000A0    no recovery target specified

2    0/A00000A0    no recovery target specified

3    4/C00000A0    no recovery target specified

4    4/E00000A0    no recovery target specified
^C
[lightdb@hs-10-19-36-9 pg_wal]$ cat 00000005.history 
1    0/800000A0    no recovery target specified

2    0/A00000A0    no recovery target specified

3    4/C00000A0    no recovery target specified

4    4/E00000A0    no recovery target specified
[lightdb@hs-10-19-36-9 pg_wal]$ cat 00000004.history 
1    0/800000A0    no recovery target specified

2    0/A00000A0    no recovery target specified

3    4/C00000A0    no recovery target specified
[lightdb@hs-10-19-36-9 pg_wal]$ cat 00000003.history 
1    0/800000A0    no recovery target specified

2    0/A00000A0    no recovery target specified
[lightdb@hs-10-19-36-9 pg_wal]$ cat 00000006.history 
1    0/800000A0    no recovery target specified

2    0/A00000A0    no recovery target specified

3    4/C00000A0    no recovery target specified

4    4/E00000A0    no recovery target specified

5    8/A0000000    no recovery target specified

 

Posted on Sat, 30 Oct 2021 15:21:28 -0400 by showman