PostgreSQL uses the process mode. For each client, it will Fork a background process to respond to the client's request. This section describes the data structure that each background process has in shared memory: PGPROC.
I. data structure
Macro definition
/* * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved * for buffer references in buf_internals.h. This limitation could be lifted * by using a 64bit state; but it's unlikely to be worthwhile as 2^18-1 * backends exceed currently realistic configurations. Even if that limitation * were removed, we still could not a) exceed 2^23-1 because inval.c stores * the backend ID as a 3-byte signed integer, b) INT_MAX/4 because some places * compute 4*MaxBackends without any overflow check. This is rechecked in the * relevant GUC check hooks and in RegisterBackgroundWorker(). * Note: Max backup is limited to 2 ^ 18-1, * This is because the value is the maximum width of the cache dependency defined in buf_internals.h * The limit can be raised by using a 64 bit state, but it doesn't seem to be worth it * If we remove this limitation, we can still not exceed: * a) 2^23-1,Because inval.c uses 3-byte signed integers to store the background process ID * b) INT_MAX/4 ,Because there is no overflow check in some places, the value of 4*MaxBackends is calculated directly * This value is checked in the relevant GUC check hook and the RegisterBackgroundWorker() function */ #define MAX_BACKENDS 0x3FFFF /* shmqueue.c */ typedef struct SHM_QUEUE { struct SHM_QUEUE *prev; struct SHM_QUEUE *next; } SHM_QUEUE; /* * An invalid pgprocno. Must be larger than the maximum number of PGPROC * structures we could possibly have. See comments for MAX_BACKENDS. * Invalid pg process number * Must be greater than the maximum number of PGPROC we may have * For a detailed explanation, please refer to Max "backends" */ #define INVALID_PGPROCNO PG_INT32_MAX
LWLock
Code outside lwlock.c should not directly manipulate the contents of this structure, but we must declare the structure to incorporate LWLocks into other data structures.
/* * Code outside of lwlock.c should not manipulate the contents of this * structure directly, but we have to declare it here to allow LWLocks to be * incorporated into other data structures. * lwlock.c External code should not directly operate on the content of this structure, * But we have to declare the structure to incorporate LWLocks into other data structures. */ typedef struct LWLock { uint16 tranche; /* tranche ID */ //Exclusive / non exclusive locker status pg_atomic_uint32 state; /* state of exclusive/nonexclusive lockers */ //Waiting PGPROCs list proclist_head waiters; /* list of waiting PGPROCs */ #ifdef LOCK_DEBUG / / for DEBUG //Number of waiters pg_atomic_uint32 nwaiters; /* number of waiters */ //Last owner of lock struct PGPROC *owner; /* last exclusive owner of the lock */ #endif } LWLock;
PGPROC
Each background process has a PGPROC structure in shared memory
Globally, there are also unused PGPROC structure linked lists for reuse to allocate new background processes
The function of this data structure is to:
PostgreSQL backend processes can't see each other's memory directly, nor can the postmaster see into PostgreSQL backend process memory. Yet they need some way to communicate and co-ordinate, and the postmaster needs a way to keep track of them.
In short, it is used for inter process collaboration and communication as well as post master tracking
/* * Each backend has a PGPROC struct in shared memory. There is also a list of * currently-unused PGPROC structs that will be reallocated to new backends. * Each background process has a PGPROC structure in shared memory * There is a list of unused PGPROC structures for reallocation of new daemons * * links: list link for any list the PGPROC is in. When waiting for a lock, * the PGPROC is linked into that lock's waitProcs queue. A recycled PGPROC * is linked into ProcGlobal's freeProcs list. * links: PGPROC The link of the linked list * While waiting for the lock, PGPROC is linked to the waiProc queue of the lock * The recovered PGPROC is linked to the free procs list of ProcGlobal * * Note: twophase.c also sets up a dummy PGPROC struct for each currently * prepared transaction. These PGPROCs appear in the ProcArray data structure * so that the prepared transactions appear to be still running and are * correctly shown as holding locks. A prepared transaction PGPROC can be * distinguished from a real one at need by the fact that it has pid == 0. * The semaphore and lock-activity fields in a prepared-xact PGPROC are unused, * but its myProcLocks[] lists are valid. * Note: twophase.c will also configure a virtual PGPROC structure for each currently prepared transaction * These PGPROCs appear in the array ProcArray data structure so that the prepared transactions appear to be still running, * And the correct display is holding the lock * The actual difference between a prepared transaction PGPROC and a real PGPROC is pid == 0 * Semaphore and active lock field in prepared Xact pgproc are not used, but myProcLocks [] list is valid */ struct PGPROC { /* proc->links MUST BE FIRST IN STRUCT (see ProcSleep,ProcWakeup,etc) */ //Proc - > links must be the first domain of the structure (refer to ProcSleep,ProcWakeup... Etc.) //If the process is in the linked list, this is the link of the linked list SHM_QUEUE links; /* list link if process is in a list */ //An array of procglobal linked lists holding the PGPROC PGPROC **procgloballist; /* procglobal list that owns this PGPROC */ //Semaphores that can sleep PGSemaphore sem; /* ONE semaphore to sleep on */ //Status? Waiting, status? OK or status? Error int waitStatus; /* STATUS_WAITING, STATUS_OK or STATUS_ERROR */ //Process universal latch Latch procLatch; /* generic latch for process */ //The local ID of the highest level transaction that the running process is executing. If it is not running, it is InvalidLocalTransactionId LocalTransactionId lxid; /* local id of top-level transaction currently * being executed by this proc, if running; * else InvalidLocalTransactionId */ //ID of the background process, 0 if it is a virtual transaction int pid; /* Backend's process ID; 0 if prepared xact */ int pgprocno; /* These fields are zero while a backend is still starting up: */ //------------These domains are 0 when the process is starting //backend ID of the assigned background process BackendId backendId; /* This backend's backend ID (if assigned) */ //The database ID used by the process Oid databaseId; /* OID of database this backend is using */ //Use the role ID of the process Oid roleId; /* OID of role using this backend */ //The temporary schema OID used by the process Oid tempNamespaceId; /* OID of temp schema this backend is * using */ //T for background processes bool isBackgroundWorker; /* true if background worker. */ /* * While in hot standby mode, shows that a conflict signal has been sent * for the current transaction. Set/cleared while holding ProcArrayLock, * though not required. Accessed without lock, if needed. * If in hot standby mode, it shows that a conflict signal has been sent for the current transaction * Although not required, set / clear the ProcArrayLock held * If necessary, access without holding the lock */ bool recoveryConflictPending; /* Info about LWLock the process is currently waiting for, if any. */ //--------------Information about LWLock the process is waiting for //Wait LW lock, T bool lwWaiting; /* true if waiting for an LW lock */ //LWLock lock mode waiting uint8 lwWaitMode; /* lwlock mode being waited for */ //Wait for the position in the list proclist_node lwWaitLink; /* position in LW lock wait list */ /* Support for condition variables. */ //--------------Support condition variable //CV waiting for position in list proclist_node cvWaitLink; /* position in CV wait list */ /* Info about lock the process is currently waiting for, if any. */ //--------------Lock information the process is waiting for /* waitLock and waitProcLock are NULL if not currently waiting. */ //waitLock and waitProcLock are NULL if they are not waiting //A lock object waiting for LOCK *waitLock; /* Lock object we're sleeping on ... */ //Information of each lock holder waiting for the lock PROCLOCK *waitProcLock; /* Per-holder info for awaited lock */ //Type of waiting LOCKMODE waitLockMode; /* type of lock we're waiting for */ //The process already holds the type bitmask of the lock LOCKMASK heldLocks; /* bitmask for lock types already held on this * lock object by this backend */ /* * Info to allow us to wait for synchronous replication, if needed. * waitLSN is InvalidXLogRecPtr if not waiting; set only by user backend. * syncRepState must not be touched except by owning process or WALSender. * syncRepLinks used only while holding SyncRepLock. * Allows us to wait for information about synchronous replication * If there is no need to wait, waitLSN is InvalidXLogRecPtr; it is only allowed to be set by the user in the background. * You cannot modify a syncRepState unless you have a process or a wasender. * syncrepink used only when holding SyncRepLock. */ //--------------------- //Wait for the LSN or higher LSN XLogRecPtr waitLSN; /* waiting for this LSN or higher */ //Wait state for synchronous replication int syncRepState; /* wait state for sync rep */ //If the process is in the syncrep queue, the value saves the linked list link SHM_QUEUE syncRepLinks; /* list link if process is in syncrep queue */ /* * All PROCLOCK objects for locks held or awaited by this backend are * linked into one of these lists, according to the partition number of * their lock. * All the PROCLOCK objects related to the locks held or waited by the background process are linked at the end of these linked lists, * It is distinguished according to the partition number of these locks */ SHM_QUEUE myProcLocks[NUM_LOCK_PARTITIONS]; //XIDs of subtransactions struct XidCache subxids; /* cache for subtransaction XIDs */ /* Support for group XID clearing. */ /* true, if member of ProcArray group waiting for XID clear */ //Support XID group clearing //T for ProcArray group waiting for XID to clean up bool procArrayGroupMember; /* next ProcArray group member waiting for XID clear */ //Next ProcArray group number waiting for XID cleanup pg_atomic_uint32 procArrayGroupNext; /* * latest transaction id among the transaction's main XID and * subtransactions * Last transaction ID between transaction main XID and sub transaction */ TransactionId procArrayGroupMemberXid; //Waiting information of the process uint32 wait_event_info; /* proc's wait information */ /* Support for group transaction status update. */ //---------------Support group transaction status update //clog group member, then T bool clogGroupMember; /* true, if member of clog group */ //Next clog group member pg_atomic_uint32 clogGroupNext; /* next clog group member */ //clog group member transaction ID TransactionId clogGroupMemberXid; /* transaction id of clog group member */ //Transaction status of clog group members XidStatus clogGroupMemberXidStatus; /* transaction status of clog * group member */ //The clog page of the transaction ID that is a member of the clog group int clogGroupMemberPage; /* clog page corresponding to * transaction id of clog group member */ //The WAL location of the submitted records of the clog group members XLogRecPtr clogGroupMemberLsn; /* WAL location of commit record for clog * group member */ /* Per-backend LWLock. Protects fields below (but not group fields). */ //Each background process has an LWLock. Protect the following domain fields (non group fields) LWLock backendLock; /* Lock manager data, recording fast-path locks taken by this backend. */ //----------Lock management data, recording the locks obtained by the background process in the fastest path //Lock mode of each fast path slot uint64 fpLockBits; /* lock modes held for each fast-path slot */ //slots of rel oids Oid fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */ //Whether fast path vxid lock is held bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ //lxid of fast path vxid lock LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID * lock */ /* * Support for lock groups. Use LockHashPartitionLockByProc on the group * leader to get the LWLock protecting these fields. */ //---------Support lock group // Use LockHashPartitionLockByProc to get LWLock to protect these domains in group leader //leader of lock group, if "I" is one of them PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ //If "I" is leader, this is the list of members dlist_head lockGroupMembers; /* list of members, if I'm a leader */ //Member connection, if "I" is one of them dlist_node lockGroupLink; /* my member link, if I'm a member */ };
MyProc
Each process has a global variable: MyProc
extern PGDLLIMPORT PGPROC *MyProc; extern PGDLLIMPORT struct PGXACT *MyPgXact;
II. Source code interpretation
N/A
III. tracking analysis
Start two sessions and execute the same SQL statement:
insert into t_wal_partition(c1,c2,c3) VALUES(0,'HASH0','HAHS0');
Session 1
Start gdb, start trace
(gdb) b XLogInsertRecord Breakpoint 1 at 0x54d122: file xlog.c, line 970. (gdb) c Continuing. Breakpoint 1, XLogInsertRecord (rdata=0xf9cc70 <hdr_rdt>, fpw_lsn=0, flags=1 '\001') at xlog.c:970 970 XLogCtlInsert *Insert = &XLogCtl->Insert;
View data structures in memory
(gdb) p *MyProc $3 = , procgloballist = 0x7fa79c087c98, sem = 0x7fa779fc81b8, waitStatus = 0, procLatch = { is_set = 0, is_shared = true, owner_pid = 1398}, lxid = 3, pid = 1398, pgprocno = 99, backendId = 3, databaseId = 16402, roleId = 10, tempNamespaceId = 0, isBackgroundWorker = false, recoveryConflictPending = false, lwWaiting = false, lwWaitMode = 0 '\000', lwWaitLink = , cvWaitLink = , waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0, syncRepState = 0, syncRepLinks = { prev = 0x0, next = 0x0}, myProcLocks = {, , , , , , , , , , , , , , , }, subxids = }, procArrayGroupMember = false, procArrayGroupNext = , procArrayGroupMemberXid = 0, wait_event_info = 0, clogGroupMember = false, clogGroupNext = , clogGroupMemberXid = 0, clogGroupMemberXidStatus = 0, clogGroupMemberPage = -1, clogGroupMemberLsn = 0, backendLock = , waiters = }, fpLockBits = 196027139227648, fpRelId = , fpVXIDLock = true, fpLocalTransactionId = 3, lockGroupLeader = 0x0, lockGroupMembers = }, lockGroupLink = }
Note: lwWaiting value is false, indicating that LW Lock is not waiting
Session 2
Start gdb, start trace
(gdb) b heap_insert Breakpoint 2 at 0x4df4d1: file heapam.c, line 2449. (gdb) c Continuing. ^C Program received signal SIGINT, Interrupt. 0x00007fa7a7ee7a0b in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0, futex=0x7fa779fc8138) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43 43 err = lll_futex_wait (futex, expected, private);
Unable to enter heap insert temporarily
View data structures in memory
(gdb) p *MyProc $36 = , procgloballist = 0x7fa79c087c98, sem = 0x7fa779fc8138, waitStatus = 0, procLatch = , lxid = 13, pid = 1449, pgprocno = 98, backendId = 4, databaseId = 16402, roleId = 10, tempNamespaceId = 0, isBackgroundWorker = false, recoveryConflictPending = false, lwWaiting = true, lwWaitMode = 0 '\000', lwWaitLink = , cvWaitLink = , waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0, syncRepState = 0, syncRepLinks = { prev = 0x0, next = 0x0}, myProcLocks = {, , , , , , , , , , , , , , , }, subxids = }, procArrayGroupMember = false, procArrayGroupNext = , procArrayGroupMemberXid = 0, wait_event_info = 16777270, clogGroupMember = false, clogGroupNext = , clogGroupMemberXid = 0, clogGroupMemberXidStatus = 0, clogGroupMemberPage = -1, clogGroupMemberLsn = 0, backendLock = , waiters = { head = 2147483647, tail = 2147483647}}, fpLockBits = 196027139227648, fpRelId = , fpVXIDLock = true, fpLocalTransactionId = 13, lockGroupLeader = 0x0, lockGroupMembers = }, lockGroupLink = { prev = 0x0, next = 0x0}}
Be careful:
lwWaiting value is true, waiting for LWLock of Session 1
lwWaitLink = , where next = 114, where 114 refers to the ITEM whose global variable procglobal (type: proc_hdr) - > allprocs array subscript is 114
(gdb) p ProcGlobal->allProcs[114] $41 = , procgloballist = 0x0, sem = 0x7fa779fc8938, waitStatus = 0, procLatch = { is_set = 0, is_shared = true, owner_pid = 1351}, lxid = 0, pid = 1351, pgprocno = 114, backendId = -1, databaseId = 0, roleId = 0, tempNamespaceId = 0, isBackgroundWorker = false, recoveryConflictPending = false, lwWaiting = true, lwWaitMode = 1 '\001', lwWaitLink = , cvWaitLink = , waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0, syncRepState = 0, syncRepLinks = , myProcLocks = {, , , , , , , , , , , , , , , }, subxids = }, procArrayGroupMember = false, procArrayGroupNext = , procArrayGroupMemberXid = 0, wait_event_info = 16777270, clogGroupMember = false, clogGroupNext = , clogGroupMemberXid = 0, clogGroupMemberXidStatus = 0, clogGroupMemberPage = 0, clogGroupMemberLsn = 0, backendLock = , waiters = }, fpLockBits = 0, fpRelId = , fpVXIDLock = false, fpLocalTransactionId = 0, lockGroupLeader = 0x0, lockGroupMembers = }, lockGroupLink = { prev = 0x0, next = 0x0}}
ProcGlobal to be introduced in the next section
IV. references
What is the role of struct 'PGPROC' in PostgreSQL?