Greenplum 6 installation configuration details

catalogue

1, Host planning

2, Platform requirements

3, Operating system configuration

1. Disable SELinux and firewall

2. System configuration

3. Synchronize system clock

4. Create gpadmin account

5. Install Java (optional)

4, Install Greenplum database software

1. Install Greenplum database

2. Configure secret free SSH

3. Confirm software installation

5, Create data store

1. Create a data store on the master and standby master hosts

2. Create a data store on the segment host

6, Verification system

1. Verify network performance

2. Verify disk I/O and memory bandwidth performance

7, Initialize Greenplum database system

1. Initialize Greenplum database

2. Set the Greenplum environment variable

3. Allow client connection

4. Modify parameters

8, Follow up

reference resources:

1, Host planning

114.112.77.199 master,segment 210.73.209.103 standby master,segment 140.210.73.67 segment

2, Platform requirements

1. Operating system CentOS 64 bit 7.3 and above, set swap to be as large as memory.

2. Dependent software package apr apr-util bash bzip2 curl krb5 libcurl libevent libxml2 libyaml zlib openldap openssh openssl openssl-libs perl readline rsync R sed tar zip

3. Java Open JDK 8 or Open JDK 11

4. Hardware and network (1) Minimum physical memory 16G (2) All hosts in the cluster are in the same LAN (connected to 10 Gigabit switches). It is recommended that each host should have at least two 10 Gigabit network cards as bonding mode 4. (3) The data partition uses the XFS file system. The master and standby master hosts only need one data partition / data, and the segment host needs two data partitions / data1 and / data2, which are used as the primary and mirror.

Official documents: http://docs.greenplum.org/6-12/install_guide/platform-requirements.html

5. User data space computing Disk space * 0.7 = free space = (2 * user data space) + user data space / 3 Of which: 2 * user data space is the space required by primary + mirror. User data space / 3 is the space required by the work space.

give an example: Disk space = 2T, then user data free space = 2T*0.7*3/7=600G.

Official documents: http://docs.greenplum.org/6-12/install_guide/capacity_planning.html

3, Operating system configuration

1. Disable SELinux and firewall

Perform the following steps with root on all hosts.

(1) Check SELinux status

sestatus

(2) If the SELinux status is not disabled, edit the / etc/selinux/config file and modify the SELinux parameter value to disabled. SELINUX=disabled

#Disable SELinux

setenforce 0

(3) Check firewall status systemctl status firewalld If disabled, the output is as follows: ‚óŹ firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1)

(4) Disable the firewall service if necessary

systemctl stop firewalld.service
systemctl disable firewalld.service

2. System configuration

Perform the following steps with root on all hosts.

(1) Set host name Edit the / etc/hosts file and add all IP, host name and alias in Greenplum. The master alias is mdw, the standby master alias is smdw, and the segment alias is sdw1, sdw2.

(2) Set system parameters Edit the / etc/sysctl.conf file and add the following parameter settings:

kernel.shmall = 197951838        # Set to echo $(expr $(getconf _PHYS_PAGES) / 2)
kernel.shmmax = 810810728448    # Set to the value of echo $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
kernel.shmmni = 4096
vm.overcommit_memory = 2
vm.overcommit_ratio = 95

net.ipv4.ip_local_port_range = 10000 65535
kernel.sem = 500 2048000 200 4096
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.zone_reclaim_mode = 0
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100

# Memory larger than 64G recommended
vm.dirty_background_ratio = 0 # See System Memory
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736 # 1.5GB
vm.dirty_bytes = 4294967296 # 4GB

Configuring vm.min_free_kbytes

awk 'BEGIN {OFMT = "%.0f";} /MemTotal/ {print "vm.min_free_kbytes =", $2 * .03;}' /proc/meminfo >> /etc/sysctl.conf

Make configuration effective:

sysctl -p

(3) Set resource limits Edit the / etc/security/limits.d/20-nproc.conf file and add (or modify) the following parameter settings:

* soft nofile 524288
* hard nofile 524288
* soft nproc 131072
* hard nproc 131072

#Check

ulimit -a

(4) Set the XFS file system mount option Edit the / etc/fstab file and add the XFS file system mount options as rw,nodev,noatime,nobarrier,inode64, for example:

/dev/data /data xfs rw,nodev,noatime,nobarrier,inode64 0 0
/dev/data1 /data1 xfs rw,nodev,noatime,nobarrier,inode64 0 0
/dev/data2 /data2 xfs rw,nodev,noatime,nobarrier,inode64 0 0

Make the configuration effective, for example:

mount -o remount /data
mount -o remount /data1
mount -o remount /data2

#Check

mount

(5) Set the read ahead value to 16384

# Get a value, for example:
/sbin/blockdev --getra /dev/sdb1
# Set values, for example:
/sbin/blockdev --setra 16384 /dev/sdb1

Add the setting command to the / etc/rc.d/rc.local file and set the file to be executable, so that the system restart will be executed automatically.

chmod +x /etc/rc.d/rc.local

(6) Set disk IO scheduling policy For example:

echo deadline > /sys/block/sdb/queue/scheduler
echo mq-deadline > /sys/block/nvme0n1/queue/scheduler
echo mq-deadline > /sys/block/nvme1n1/queue/scheduler

Add the setting command to the / etc/rc.d/rc.local file to make the system restart execute automatically.

# The following methods are invalid after restart
grubby --update-kernel=ALL --args="elevator=mq-deadline"
grubby --info=ALL

(7) Disable transparent large pages (THP)

# View current configuration
cat /sys/kernel/mm/transparent_hugepage/enabled

# set up
echo never > /sys/kernel/mm/transparent_hugepage/enabled

Make the system restart take effect automatically:

grubby --update-kernel=ALL --args="transparent_hugepage=never"
grubby --info=ALL

(8) Disable IPC object deletion Edit the / etc/systemd/logind.conf file and set the RemoveIPC parameter:

RemoveIPC=no

Restart the service for the configuration to take effect:

service systemd-logind restart

(9) Set SSH connections threshold Edit / etc/ssh/sshd_config file, set the following parameters:

MaxStartups 10:30:200
MaxSessions 200

Restart the service for the configuration to take effect:

systemctl reload sshd.service

(10) Confirm or configure time zone The output of the date command should be Zone 8, e.g. Thu Feb 25 08:13:00 CST 2021 If the time zone set when installing the operating system is incorrect, you can execute the tzselect command to change the time zone, and then select Asia - > China - > Beijing Time - > Yes. Be sure to set the time zone correctly before installing Greenplum, because after the Greenplum system is initialized, LC_COLLATE,LC_ The value of ctype cannot be changed.

3. Synchronize system clock

(1) Add the NTP server in the / etc/ntp.conf file of the master host server 101.251.209.250 (2) Add the NTP server in the / etc/ntp.conf file of the standby master host server mdw prefer server 101.251.209.250 (3) Add the NTP server in the / etc/ntp.conf file of all segment hosts server mdw prefer server smdw (4) Start the ntpd service on all hosts and view the time synchronization status

systemctl disable chronyd
systemctl enable ntpd
systemctl start ntpd
ntpstat 

4. Create gpadmin account

(1) Create groups and users

groupadd -r -g 1001 gpadmin
useradd gpadmin -r -m -g gpadmin -u 1001
passwd gpadmin

chown -R gpadmin:gpadmin /data
chown -R gpadmin:gpadmin /data1
chown -R gpadmin:gpadmin /data2

(2) Generate SSH key pair

su gpadmin
ssh-keygen -t rsa -b 4096

(3) Grant sudo access to gpadmin user

visudo

Remove the following comments: # %wheel ALL=(ALL) NOPASSWD: ALL

Add the gpadmin user to the wheel group:

usermod -aG wheel gpadmin

5. Install Java (optional)

# Find the java package in the yum resource library
yum search java | grep -i --color JDK
# Install Java 1.8
yum install -y java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel.x86_64 
# Verify installation
java -version

Restart the host for all configurations to take effect.

Official documents: http://docs.greenplum.org/6-12/install_guide/prep_os.html

4, Install Greenplum database software

1. Install Greenplum database

Perform the following steps with root on all hosts. (1) Download installation package

wegt https://github.com/greenplum-db/gpdb/releases/download/6.14.1/open-source-greenplum-db-6.14.1-rhel7-x86_64.rpm

(2) Installation

yum -y install ./open-source-greenplum-db-6.14.1-rhel7-x86_64.rpm

(3) Modify the owner and group of the installation directory

chown -R gpadmin:gpadmin /usr/local/greenplum*
chgrp -R gpadmin /usr/local/greenplum*

2. Configure secret free SSH

On the master host, use the gpadmin user to perform the following steps. (1) Setting up the Greenplum environment

source /usr/local/greenplum-db/greenplum_path.sh

(2) Enable 1-n SSH encryption free

# Copy the public key of the current user to the authorized of other hosts in the cluster_ Hosts file
ssh-copy-id mdw
ssh-copy-id smdw
ssh-copy-id sdw1
ssh-copy-id sdw2
ssh-copy-id sdw3
...

(3) Create a directory named all in the gpadmin user home directory_ Host, which contains all Greenplum host names, for example:

mdw
smdw
sdw3

(4) Enable n-n SSH encryption free

gpssh-exkeys -f all_host

3. Confirm software installation

On the master host, use the gpadmin user to perform the following steps.

gpssh -f all_host -e 'ls -l /usr/local/greenplum-db-<version>'

If Greenplum is successfully installed, you should be able to log in to all hosts without a password prompt.

Official documents: http://docs.greenplum.org/6-12/install_guide/install_gpdb.html

5, Create data store

1. Create a data store on the master and standby master hosts

On the master host, execute the following command with the gpadmin user.

mkdir -p /data/master
chown gpadmin:gpadmin /data/master
source /usr/local/greenplum-db/greenplum_path.sh 
gpssh -h smdw -e 'mkdir -p /data/master'
gpssh -h smdw -e 'chown gpadmin:gpadmin /data/master'

2. Create a data store on the segment host

On the master host, use the gpadmin user to perform the following steps. (1) Create a file named seg_host file, the contents of which are all segment host names, for example:

sdw1
sdw2
sdw3
sdw4

(2) Create primary and mirror data directory locations on all segment hosts at once

source /usr/local/greenplum-db/greenplum_path.sh 
gpssh -f seg_host -e 'mkdir -p /data1/primary'
gpssh -f seg_host -e 'mkdir -p /data1/mirror'
gpssh -f seg_host -e 'mkdir -p /data2/primary'
gpssh -f seg_host -e 'mkdir -p /data2/mirror'
gpssh -f seg_host -e 'chown -R gpadmin /data1/*'
gpssh -f seg_host -e 'chown -R gpadmin /data2/*'

Official documents: http://docs.greenplum.org/6-12/install_guide/create_data_dirs.html

6, Verification system

1. Verify network performance

On the master host, use the gpadmin user to perform the following steps. (1) Setting up the Greenplum environment

source /usr/local/greenplum-db/greenplum_path.sh

(2) Checkpoint to peer network transmission speed:

# Two way simultaneous contracting is suitable for even number of network ports
gpcheckperf -f all_host -r N -d /tmp > subnet.out
# One way sequential contracting is suitable for odd or even network ports
gpcheckperf -f all_host -r n -d /tmp > subnet.out

(3) Check the full matrix many to many network transmission speed:

gpcheckperf -f all_host -r M -d /tmp > subnet.out

The result should be greater than 100MB/s.

2. Verify disk I/O and memory bandwidth performance

On the master host, use the gpadmin user to perform the following steps. (1) Setting up the Greenplum environment

source /usr/local/greenplum-db/greenplum_path.sh

(2) Check disk I/O (dd) and memory bandwidth (stream) performance

gpcheckperf -f seg_host -r ds -D -d /data1/primary -d /data2/primary -d /data1/mirror -d /data2/mirror > io.out

Official documents: http://docs.greenplum.org/6-12/install_guide/validate.html

7, Initialize Greenplum database system

1. Initialize Greenplum database

On the master host, use the gpadmin user to perform the following steps.

(1) Setting up the Greenplum environment

source /usr/local/greenplum-db/greenplum_path.sh

(2) Create Greenplum database configuration file

# Copy Greenplum database configuration file
cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpinitsystem_config
# Edit / home / gpadmin / gpinitsystem_ The config file is as follows
ARRAY_NAME="Greenplum Data Platform"
SEG_PREFIX=gpseg
PORT_BASE=6000
declare -a DATA_DIRECTORY=(/data1/primary /data1/primary /data1/primary /data2/primary /data2/primary /data2/primary)
MASTER_HOSTNAME=mdw
MASTER_DIRECTORY=/data/master
MASTER_PORT=5432
TRUSTED_SHELL=ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE
MIRROR_PORT_BASE=7000
declare -a MIRROR_DATA_DIRECTORY=(/data1/mirror /data1/mirror /data1/mirror /data2/mirror /data2/mirror /data2/mirror)

(3) Perform initialization

cd ~
gpinitsystem -c gpinitsystem_config -h seg_host -s smdw -S /data/master/ -O config_template

The gpinitsystem application verifies the system configuration to ensure that you can connect to each host and access the data directory specified in the configuration. If all checks are successful, the program will prompt to confirm the configuration, for example: => Continue with Greenplum creation? Yy/Nn

Type y to begin initialization. When the installation is completed successfully, the program will start the Greenplum database system. You should see: => Greenplum Database instance successfully created.

20210308:13:18:39:082622 gpinitstandby:vvml-z2-greenplum:gpadmin-[INFO]:-Validating environment and parameters for standby initialization... 20210308:13:18:39:082622 gpinitstandby:vvml-z2-greenplum:gpadmin-[ERROR]:-Parent directory does not exist on host smdw 20210308:13:18:39:082622 gpinitstandby:vvml-z2-greenplum:gpadmin-[ERROR]:-This directory must be created before running gpactivatestandby 20210308:13:18:39:082622 gpinitstandby:vvml-z2-greenplum:gpadmin-[ERROR]:-Failed to create standby 20210308:13:18:39:082622 gpinitstandby:vvml-z2-greenplum:gpadmin-[ERROR]:-Error initializing standby master: Parent directory does not exist

gpinitstandby -s smdw

(4) Troubleshooting If any errors are encountered during initialization, the whole process will fail and may leave a partially created system. At this point, you should review the error messages and logs to determine the cause of the failure and where the failure occurred. Log files are created in ~ / gpAdminLogs.

Depending on when the error occurred, you may need to clean up and retry the gpinitsystem program. For example, if some segment instances are created but some fail, you may need to stop the postgres process and delete any data directories created by gpinitsystem from the data storage area. If necessary, a backout script will be created to help clean up.

If the gpinitsystem program fails and the system is partially installed, the following backup script is created: ~/gpAdminLogs/backout_gpinitsystem_<user>_<timestamp>

You can use this script to clean up the Greenplum database system created in the section. This fallback script will delete any data directories, postgres processes, and log files created by gpinitsystem. sh backout_gpinitsystem_gpadmin_20071031_121053

After correcting the error that caused the gpinitsystem to fail and running the backout script, reinitialize the Greenplum database.

2. Set the Greenplum environment variable

On the master host, use the gpadmin user to perform the following steps.

(1) Edit the resource file ~ /. bashrc and add the following environment variables to the file

source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/data/master/gpseg-1
export PGPORT=5432
export PGUSER=gpadmin
export PGDATABASE=postgres
export LD_PRELOAD=/lib64/libz.so.1 ps

(2) Make configuration effective

source ~/.bashrc

(3) Copy environment files to standby master

cd ~
scp .bashrc smdw:`pwd`

3. Allow client connection

(1) Edit / data/master/gpseg-1/pg_hba.conf file, add the following client ip or network segment to allow access to any address.

host   all   all    0.0.0.0/0    md5

pg_ The entries in hba.conf are matched in order. The general principle is that the more advanced the entries are, the more strict the matching conditions are, but the weaker the authentication method is; The later the entries are, the looser the matching conditions are, but the stronger the authentication method is. The local socket connection uses ident authentication.

4. Modify parameters

According to the specific hardware configuration, the attribute values in postgresql.conf can be referenced https://pgtune.leopard.in.ua/#/ . (1) Check status

gpstate
gpstate -e
gpstate -f

(2) Set parameters

gpconfig -c max_connections -v 2500 -m 500
gpconfig -c max_prepared_transactions -v 500
gpconfig -c shared_buffers -v 5GB -m 32GB
gpconfig -c effective_cache_size -v 16GB -m 96GB
gpconfig -c maintenance_work_mem -v 1280MB -m 2GB
gpconfig -c checkpoint_completion_target -v 0.9
gpconfig -c wal_buffers -v 16MB -m 16MB
# gpconfig -c checkpoint_segments -v 32 --skipvalidation
gpconfig -c effective_io_concurrency -v 200
gpconfig -c default_statistics_target -v 100
gpconfig -c random_page_cost -v 1.1
gpconfig -c log_statement -v none
gpconfig -c gp_enable_global_deadlock_detector -v on
gpconfig -c gp_workfile_compression -v on
gpconfig -c gp_max_partition_level -v 1
# Physical memory * 0.9 / (number of primary + mirrors), in MB. For example, 256G memory, 6 primary nodes and 6 mirrors are set to 19660.
gpconfig -c gp_vmem_protect_limit -v 19660
# Set the number of CPU cores on dedicated master and standby hosts, and set the number of CPU cores / (primary+mirror) on segment hosts. For example, 64 cores, 6 primary and 6 mirror, are set as follows:
gpconfig -c gp_resqueue_priority_cpucores_per_segment -v 5.3 -m 64

(3) Execute checkpoint

psql -c "CHECKPOINT"

(4) Restart greenplus

gpstop -r

Official documents: http://docs.greenplum.org/6-12/install_guide/init_gpdb.html

8, Follow up

1. Create a temporary tablespace create tablespace tmptbs location '/data/tmptbs'; alter role all set temp_tablespaces='tmptbs';

2. Create user create role dwtest with password '123456' login createdb; 3. Test login psql -U dwtest -h mdw

reference resources:

1. Official documents http://docs.greenplum.org/6-12/install_guide/install_guide.html

2. Detailed installation process of greenplus [original] detailed installation process of greenplus - Yongge's blog - blog Park

3. Greenplus server installation-20201123.docx

4. Greenplum installation and deployment reference manual Greenplum installation and deployment reference manual - Baidu Library

Posted on Tue, 07 Dec 2021 00:40:40 -0500 by levidyllan