Interpretation of PostgreSQL source code (108) - background process (PGPROC data structure)

PostgreSQL uses the process mode. For each client, it will Fork a background process to respond to the client's request. This section describes the data structure that each background process has in shared memory: PGPROC.

I. data structure

Macro definition

/*
 * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
 * for buffer references in buf_internals.h.  This limitation could be lifted
 * by using a 64bit state; but it's unlikely to be worthwhile as 2^18-1
 * backends exceed currently realistic configurations. Even if that limitation
 * were removed, we still could not a) exceed 2^23-1 because inval.c stores
 * the backend ID as a 3-byte signed integer, b) INT_MAX/4 because some places
 * compute 4*MaxBackends without any overflow check.  This is rechecked in the
 * relevant GUC check hooks and in RegisterBackgroundWorker().
 * Note: Max backup is limited to 2 ^ 18-1,
 *   This is because the value is the maximum width of the cache dependency defined in buf_internals.h
 * The limit can be raised by using a 64 bit state, but it doesn't seem to be worth it
 * If we remove this limitation, we can still not exceed:
 *   a) 2^23-1,Because inval.c uses 3-byte signed integers to store the background process ID
 *   b) INT_MAX/4 ,Because there is no overflow check in some places, the value of 4*MaxBackends is calculated directly
 * This value is checked in the relevant GUC check hook and the RegisterBackgroundWorker() function
 */
#define MAX_BACKENDS    0x3FFFF

/* shmqueue.c */
typedef struct SHM_QUEUE
{
    struct SHM_QUEUE *prev;
    struct SHM_QUEUE *next;
} SHM_QUEUE;

/*
 * An invalid pgprocno.  Must be larger than the maximum number of PGPROC
 * structures we could possibly have.  See comments for MAX_BACKENDS.
 * Invalid pg process number
 * Must be greater than the maximum number of PGPROC we may have
 * For a detailed explanation, please refer to Max "backends"
 */
#define INVALID_PGPROCNO        PG_INT32_MAX

LWLock
Code outside lwlock.c should not directly manipulate the contents of this structure, but we must declare the structure to incorporate LWLocks into other data structures.

/*
 * Code outside of lwlock.c should not manipulate the contents of this
 * structure directly, but we have to declare it here to allow LWLocks to be
 * incorporated into other data structures.
 * lwlock.c External code should not directly operate on the content of this structure,
 *   But we have to declare the structure to incorporate LWLocks into other data structures.
 */
typedef struct LWLock
{
    uint16      tranche;        /* tranche ID */
    //Exclusive / non exclusive locker status
    pg_atomic_uint32 state;     /* state of exclusive/nonexclusive lockers */
    //Waiting PGPROCs list
    proclist_head waiters;      /* list of waiting PGPROCs */
#ifdef LOCK_DEBUG / / for DEBUG
    //Number of waiters
    pg_atomic_uint32 nwaiters;  /* number of waiters */
    //Last owner of lock
    struct PGPROC *owner;       /* last exclusive owner of the lock */
#endif
} LWLock;

PGPROC
Each background process has a PGPROC structure in shared memory
Globally, there are also unused PGPROC structure linked lists for reuse to allocate new background processes
The function of this data structure is to:

PostgreSQL backend processes can't see each other's memory directly, nor can the postmaster see into PostgreSQL backend process memory. Yet they need some way to communicate and co-ordinate, and the postmaster needs a way to keep track of them.

In short, it is used for inter process collaboration and communication as well as post master tracking


/*
 * Each backend has a PGPROC struct in shared memory.  There is also a list of
 * currently-unused PGPROC structs that will be reallocated to new backends.
 * Each background process has a PGPROC structure in shared memory
 * There is a list of unused PGPROC structures for reallocation of new daemons
 * 
 * links: list link for any list the PGPROC is in.  When waiting for a lock,
 * the PGPROC is linked into that lock's waitProcs queue.  A recycled PGPROC
 * is linked into ProcGlobal's freeProcs list.
 * links: PGPROC The link of the linked list
 *   While waiting for the lock, PGPROC is linked to the waiProc queue of the lock
 * The recovered PGPROC is linked to the free procs list of ProcGlobal
 *
 * Note: twophase.c also sets up a dummy PGPROC struct for each currently
 * prepared transaction.  These PGPROCs appear in the ProcArray data structure
 * so that the prepared transactions appear to be still running and are
 * correctly shown as holding locks.  A prepared transaction PGPROC can be
 * distinguished from a real one at need by the fact that it has pid == 0.
 * The semaphore and lock-activity fields in a prepared-xact PGPROC are unused,
 * but its myProcLocks[] lists are valid.
 * Note: twophase.c will also configure a virtual PGPROC structure for each currently prepared transaction
 * These PGPROCs appear in the array ProcArray data structure so that the prepared transactions appear to be still running,
 *   And the correct display is holding the lock
 * The actual difference between a prepared transaction PGPROC and a real PGPROC is pid == 0
 * Semaphore and active lock field in prepared Xact pgproc are not used, but myProcLocks [] list is valid
 */
struct PGPROC
{
    /* proc->links MUST BE FIRST IN STRUCT (see ProcSleep,ProcWakeup,etc) */
    //Proc - > links must be the first domain of the structure (refer to ProcSleep,ProcWakeup... Etc.)
    //If the process is in the linked list, this is the link of the linked list
    SHM_QUEUE   links;          /* list link if process is in a list */
    //An array of procglobal linked lists holding the PGPROC
    PGPROC    **procgloballist; /* procglobal list that owns this PGPROC */
    //Semaphores that can sleep
    PGSemaphore sem;            /* ONE semaphore to sleep on */
    //Status? Waiting, status? OK or status? Error
    int         waitStatus;     /* STATUS_WAITING, STATUS_OK or STATUS_ERROR */
    //Process universal latch
    Latch       procLatch;      /* generic latch for process */
    //The local ID of the highest level transaction that the running process is executing. If it is not running, it is InvalidLocalTransactionId
    LocalTransactionId lxid;    /* local id of top-level transaction currently
                                 * being executed by this proc, if running;
                                 * else InvalidLocalTransactionId */
    //ID of the background process, 0 if it is a virtual transaction
    int         pid;            /* Backend's process ID; 0 if prepared xact */
    int         pgprocno;

    /* These fields are zero while a backend is still starting up: */
    //------------These domains are 0 when the process is starting
    //backend ID of the assigned background process
    BackendId   backendId;      /* This backend's backend ID (if assigned) */
    //The database ID used by the process
    Oid         databaseId;     /* OID of database this backend is using */
    //Use the role ID of the process
    Oid         roleId;         /* OID of role using this backend */
    //The temporary schema OID used by the process
    Oid         tempNamespaceId;    /* OID of temp schema this backend is
                                     * using */
    //T for background processes
    bool        isBackgroundWorker; /* true if background worker. */

    /*
     * While in hot standby mode, shows that a conflict signal has been sent
     * for the current transaction. Set/cleared while holding ProcArrayLock,
     * though not required. Accessed without lock, if needed.
     * If in hot standby mode, it shows that a conflict signal has been sent for the current transaction
     * Although not required, set / clear the ProcArrayLock held
     * If necessary, access without holding the lock
     */
    bool        recoveryConflictPending;

    /* Info about LWLock the process is currently waiting for, if any. */
    //--------------Information about LWLock the process is waiting for
    //Wait LW lock, T
    bool        lwWaiting;      /* true if waiting for an LW lock */
    //LWLock lock mode waiting
    uint8       lwWaitMode;     /* lwlock mode being waited for */
    //Wait for the position in the list
    proclist_node lwWaitLink;   /* position in LW lock wait list */

    /* Support for condition variables. */
    //--------------Support condition variable
    //CV waiting for position in list
    proclist_node cvWaitLink;   /* position in CV wait list */

    /* Info about lock the process is currently waiting for, if any. */
    //--------------Lock information the process is waiting for
    /* waitLock and waitProcLock are NULL if not currently waiting. */
    //waitLock and waitProcLock are NULL if they are not waiting
    //A lock object waiting for
    LOCK       *waitLock;       /* Lock object we're sleeping on ... */
    //Information of each lock holder waiting for the lock
    PROCLOCK   *waitProcLock;   /* Per-holder info for awaited lock */
    //Type of waiting
    LOCKMODE    waitLockMode;   /* type of lock we're waiting for */
    //The process already holds the type bitmask of the lock
    LOCKMASK    heldLocks;      /* bitmask for lock types already held on this
                                 * lock object by this backend */

    /*
     * Info to allow us to wait for synchronous replication, if needed.
     * waitLSN is InvalidXLogRecPtr if not waiting; set only by user backend.
     * syncRepState must not be touched except by owning process or WALSender.
     * syncRepLinks used only while holding SyncRepLock.
     * Allows us to wait for information about synchronous replication
     * If there is no need to wait, waitLSN is InvalidXLogRecPtr; it is only allowed to be set by the user in the background.
     * You cannot modify a syncRepState unless you have a process or a wasender.
     * syncrepink used only when holding SyncRepLock.
     */
    //--------------------- 
    //Wait for the LSN or higher LSN
    XLogRecPtr  waitLSN;        /* waiting for this LSN or higher */
    //Wait state for synchronous replication
    int         syncRepState;   /* wait state for sync rep */
    //If the process is in the syncrep queue, the value saves the linked list link
    SHM_QUEUE   syncRepLinks;   /* list link if process is in syncrep queue */

    /*
     * All PROCLOCK objects for locks held or awaited by this backend are
     * linked into one of these lists, according to the partition number of
     * their lock.
     * All the PROCLOCK objects related to the locks held or waited by the background process are linked at the end of these linked lists,
     *   It is distinguished according to the partition number of these locks
     */
    SHM_QUEUE   myProcLocks[NUM_LOCK_PARTITIONS];
    //XIDs of subtransactions
    struct XidCache subxids;    /* cache for subtransaction XIDs */

    /* Support for group XID clearing. */
    /* true, if member of ProcArray group waiting for XID clear */
    //Support XID group clearing
    //T for ProcArray group waiting for XID to clean up
    bool        procArrayGroupMember;
    /* next ProcArray group member waiting for XID clear */
    //Next ProcArray group number waiting for XID cleanup
    pg_atomic_uint32 procArrayGroupNext;

    /*
     * latest transaction id among the transaction's main XID and
     * subtransactions
     * Last transaction ID between transaction main XID and sub transaction
     */
    TransactionId procArrayGroupMemberXid;
    //Waiting information of the process
    uint32      wait_event_info;    /* proc's wait information */

    /* Support for group transaction status update. */
    //---------------Support group transaction status update
    //clog group member, then T
    bool        clogGroupMember;    /* true, if member of clog group */
    //Next clog group member
    pg_atomic_uint32 clogGroupNext; /* next clog group member */
    //clog group member transaction ID
    TransactionId clogGroupMemberXid;   /* transaction id of clog group member */
    //Transaction status of clog group members
    XidStatus   clogGroupMemberXidStatus;   /* transaction status of clog
                                             * group member */
    //The clog page of the transaction ID that is a member of the clog group
    int         clogGroupMemberPage;    /* clog page corresponding to
                                         * transaction id of clog group member */
    //The WAL location of the submitted records of the clog group members
    XLogRecPtr  clogGroupMemberLsn; /* WAL location of commit record for clog
                                     * group member */

    /* Per-backend LWLock.  Protects fields below (but not group fields). */
    //Each background process has an LWLock. Protect the following domain fields (non group fields)
    LWLock      backendLock;

    /* Lock manager data, recording fast-path locks taken by this backend. */
    //----------Lock management data, recording the locks obtained by the background process in the fastest path
    //Lock mode of each fast path slot
    uint64      fpLockBits;     /* lock modes held for each fast-path slot */
    //slots of rel oids
    Oid         fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
    //Whether fast path vxid lock is held
    bool        fpVXIDLock;     /* are we holding a fast-path VXID lock? */
    //lxid of fast path vxid lock
    LocalTransactionId fpLocalTransactionId;    /* lxid for fast-path VXID
                                                 * lock */

    /*
     * Support for lock groups.  Use LockHashPartitionLockByProc on the group
     * leader to get the LWLock protecting these fields.
     */
    //---------Support lock group
    //          Use LockHashPartitionLockByProc to get LWLock to protect these domains in group leader
    //leader of lock group, if "I" is one of them
    PGPROC     *lockGroupLeader;    /* lock group leader, if I'm a member */
    //If "I" is leader, this is the list of members
    dlist_head  lockGroupMembers;   /* list of members, if I'm a leader */
    //Member connection, if "I" is one of them
    dlist_node  lockGroupLink;  /* my member link, if I'm a member */
};

MyProc
Each process has a global variable: MyProc

extern PGDLLIMPORT PGPROC *MyProc;
extern PGDLLIMPORT struct PGXACT *MyPgXact;

II. Source code interpretation

N/A

III. tracking analysis

Start two sessions and execute the same SQL statement:

insert into t_wal_partition(c1,c2,c3) VALUES(0,'HASH0','HAHS0');

Session 1
Start gdb, start trace

(gdb) b XLogInsertRecord
Breakpoint 1 at 0x54d122: file xlog.c, line 970.
(gdb) c
Continuing.

Breakpoint 1, XLogInsertRecord (rdata=0xf9cc70 <hdr_rdt>, fpw_lsn=0, flags=1 '\001') at xlog.c:970
970     XLogCtlInsert *Insert = &XLogCtl->Insert;

View data structures in memory

(gdb) p *MyProc
$3 = {links = {prev = 0x0, next = 0x0}, procgloballist = 0x7fa79c087c98, sem = 0x7fa779fc81b8, waitStatus = 0, procLatch = {
    is_set = 0, is_shared = true, owner_pid = 1398}, lxid = 3, pid = 1398, pgprocno = 99, backendId = 3, 
  databaseId = 16402, roleId = 10, tempNamespaceId = 0, isBackgroundWorker = false, recoveryConflictPending = false, 
  lwWaiting = false, lwWaitMode = 0 '\000', lwWaitLink = {next = 0, prev = 0}, cvWaitLink = {next = 0, prev = 0}, 
  waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0, syncRepState = 0, syncRepLinks = {
    prev = 0x0, next = 0x0}, myProcLocks = {{prev = 0x7fa79c09c588, next = 0x7fa79c09c588}, {prev = 0x7fa79c09c598, 
      next = 0x7fa79c09c598}, {prev = 0x7fa79c09c5a8, next = 0x7fa79c09c5a8}, {prev = 0x7fa79c09c5b8, 
      next = 0x7fa79c09c5b8}, {prev = 0x7fa79c09c5c8, next = 0x7fa79c09c5c8}, {prev = 0x7fa79c09c5d8, 
      next = 0x7fa79c09c5d8}, {prev = 0x7fa79c09c5e8, next = 0x7fa79c09c5e8}, {prev = 0x7fa79c09c5f8, 
      next = 0x7fa79c09c5f8}, {prev = 0x7fa79c09c608, next = 0x7fa79c09c608}, {prev = 0x7fa79c09c618, 
      next = 0x7fa79c09c618}, {prev = 0x7fa79c09c628, next = 0x7fa79c09c628}, {prev = 0x7fa79c09c638, 
      next = 0x7fa79c09c638}, {prev = 0x7fa79c09c648, next = 0x7fa79c09c648}, {prev = 0x7fa79c09c658, 
      next = 0x7fa79c09c658}, {prev = 0x7fa79c09c668, next = 0x7fa79c09c668}, {prev = 0x7fa79be25e70, 
      next = 0x7fa79be25e70}}, subxids = {xids = {0 <repeats 64 times>}}, procArrayGroupMember = false, 
  procArrayGroupNext = {value = 2147483647}, procArrayGroupMemberXid = 0, wait_event_info = 0, clogGroupMember = false, 
  clogGroupNext = {value = 2147483647}, clogGroupMemberXid = 0, clogGroupMemberXidStatus = 0, clogGroupMemberPage = -1, 
  clogGroupMemberLsn = 0, backendLock = {tranche = 58, state = {value = 536870912}, waiters = {head = 2147483647, 
      tail = 2147483647}}, fpLockBits = 196027139227648, fpRelId = {0, 0, 0, 0, 0, 2679, 2610, 2680, 2611, 17043, 17040, 
    17037, 17034, 17031, 17028, 17025}, fpVXIDLock = true, fpLocalTransactionId = 3, lockGroupLeader = 0x0, 
  lockGroupMembers = {head = {prev = 0x7fa79c09c820, next = 0x7fa79c09c820}}, lockGroupLink = {prev = 0x0, next = 0x0}}

Note: lwWaiting value is false, indicating that LW Lock is not waiting

Session 2
Start gdb, start trace

(gdb) b heap_insert
Breakpoint 2 at 0x4df4d1: file heapam.c, line 2449.
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007fa7a7ee7a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, 
    futex=0x7fa779fc8138) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
43        err = lll_futex_wait (futex, expected, private);

Unable to enter heap insert temporarily
View data structures in memory

(gdb) p *MyProc
$36 = {links = {prev = 0x0, next = 0x0}, procgloballist = 0x7fa79c087c98, sem = 0x7fa779fc8138, waitStatus = 0, 
  procLatch = {is_set = 1, is_shared = true, owner_pid = 1449}, lxid = 13, pid = 1449, pgprocno = 98, backendId = 4, 
  databaseId = 16402, roleId = 10, tempNamespaceId = 0, isBackgroundWorker = false, recoveryConflictPending = false, 
  lwWaiting = true, lwWaitMode = 0 '\000', lwWaitLink = {next = 114, prev = 2147483647}, cvWaitLink = {next = 0, prev = 0}, 
  waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0, syncRepState = 0, syncRepLinks = {
    prev = 0x0, next = 0x0}, myProcLocks = {{prev = 0x7fa79c09c238, next = 0x7fa79c09c238}, {prev = 0x7fa79c09c248, 
      next = 0x7fa79c09c248}, {prev = 0x7fa79c09c258, next = 0x7fa79c09c258}, {prev = 0x7fa79c09c268, 
      next = 0x7fa79c09c268}, {prev = 0x7fa79c09c278, next = 0x7fa79c09c278}, {prev = 0x7fa79c09c288, 
      next = 0x7fa79c09c288}, {prev = 0x7fa79c09c298, next = 0x7fa79c09c298}, {prev = 0x7fa79c09c2a8, 
      next = 0x7fa79c09c2a8}, {prev = 0x7fa79c09c2b8, next = 0x7fa79c09c2b8}, {prev = 0x7fa79c09c2c8, 
      next = 0x7fa79c09c2c8}, {prev = 0x7fa79c09c2d8, next = 0x7fa79c09c2d8}, {prev = 0x7fa79c09c2e8, 
      next = 0x7fa79c09c2e8}, {prev = 0x7fa79c09c2f8, next = 0x7fa79c09c2f8}, {prev = 0x7fa79c09c308, 
      next = 0x7fa79c09c308}, {prev = 0x7fa79be21870, next = 0x7fa79be21870}, {prev = 0x7fa79c09c328, 
      next = 0x7fa79c09c328}}, subxids = {xids = {0 <repeats 64 times>}}, procArrayGroupMember = false, 
  procArrayGroupNext = {value = 2147483647}, procArrayGroupMemberXid = 0, wait_event_info = 16777270, 
  clogGroupMember = false, clogGroupNext = {value = 2147483647}, clogGroupMemberXid = 0, clogGroupMemberXidStatus = 0, 
  clogGroupMemberPage = -1, clogGroupMemberLsn = 0, backendLock = {tranche = 58, state = {value = 536870912}, waiters = {
      head = 2147483647, tail = 2147483647}}, fpLockBits = 196027139227648, fpRelId = {0, 0, 0, 0, 0, 2655, 2603, 2680, 
    2611, 17043, 17040, 17037, 17034, 17031, 17028, 17025}, fpVXIDLock = true, fpLocalTransactionId = 13, 
  lockGroupLeader = 0x0, lockGroupMembers = {head = {prev = 0x7fa79c09c4d0, next = 0x7fa79c09c4d0}}, lockGroupLink = {
    prev = 0x0, next = 0x0}}

Be careful:
lwWaiting value is true, waiting for LWLock of Session 1
lwWaitLink = {next = 114, prev = 2147483647}, where next = 114, where 114 refers to the ITEM whose global variable procglobal (type: proc_hdr) - > allprocs array subscript is 114

(gdb) p ProcGlobal->allProcs[114]
$41 = {links = {prev = 0x0, next = 0x0}, procgloballist = 0x0, sem = 0x7fa779fc8938, waitStatus = 0, procLatch = {
    is_set = 0, is_shared = true, owner_pid = 1351}, lxid = 0, pid = 1351, pgprocno = 114, backendId = -1, databaseId = 0, 
  roleId = 0, tempNamespaceId = 0, isBackgroundWorker = false, recoveryConflictPending = false, lwWaiting = true, 
  lwWaitMode = 1 '\001', lwWaitLink = {next = 2147483647, prev = 98}, cvWaitLink = {next = 0, prev = 0}, waitLock = 0x0, 
  waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0, syncRepState = 0, syncRepLinks = {prev = 0x0, 
    next = 0x0}, myProcLocks = {{prev = 0x7fa79c09f738, next = 0x7fa79c09f738}, {prev = 0x7fa79c09f748, 
      next = 0x7fa79c09f748}, {prev = 0x7fa79c09f758, next = 0x7fa79c09f758}, {prev = 0x7fa79c09f768, 
      next = 0x7fa79c09f768}, {prev = 0x7fa79c09f778, next = 0x7fa79c09f778}, {prev = 0x7fa79c09f788, 
      next = 0x7fa79c09f788}, {prev = 0x7fa79c09f798, next = 0x7fa79c09f798}, {prev = 0x7fa79c09f7a8, 
      next = 0x7fa79c09f7a8}, {prev = 0x7fa79c09f7b8, next = 0x7fa79c09f7b8}, {prev = 0x7fa79c09f7c8, 
      next = 0x7fa79c09f7c8}, {prev = 0x7fa79c09f7d8, next = 0x7fa79c09f7d8}, {prev = 0x7fa79c09f7e8, 
      next = 0x7fa79c09f7e8}, {prev = 0x7fa79c09f7f8, next = 0x7fa79c09f7f8}, {prev = 0x7fa79c09f808, 
      next = 0x7fa79c09f808}, {prev = 0x7fa79c09f818, next = 0x7fa79c09f818}, {prev = 0x7fa79c09f828, 
      next = 0x7fa79c09f828}}, subxids = {xids = {0 <repeats 64 times>}}, procArrayGroupMember = false, 
  procArrayGroupNext = {value = 0}, procArrayGroupMemberXid = 0, wait_event_info = 16777270, clogGroupMember = false, 
  clogGroupNext = {value = 0}, clogGroupMemberXid = 0, clogGroupMemberXidStatus = 0, clogGroupMemberPage = 0, 
  clogGroupMemberLsn = 0, backendLock = {tranche = 58, state = {value = 536870912}, waiters = {head = 2147483647, 
      tail = 2147483647}}, fpLockBits = 0, fpRelId = {0 <repeats 16 times>}, fpVXIDLock = false, fpLocalTransactionId = 0, 
  lockGroupLeader = 0x0, lockGroupMembers = {head = {prev = 0x7fa79c09f9d0, next = 0x7fa79c09f9d0}}, lockGroupLink = {
    prev = 0x0, next = 0x0}}

ProcGlobal to be introduced in the next section

IV. references

What is the role of struct 'PGPROC' in PostgreSQL?

Tags: PostgreSQL Session Database SQL

Posted on Mon, 02 Dec 2019 11:33:38 -0500 by Saruman