Synchronize data to ES using Canal

1, Canal overview

1. What is Canal?

Canal is an open source component of Alibaba. Its main purpose is to provide incremental data subscription and consumption based on MySQL database incremental log parsing. The introduction of canal is very good in the official documents on GitHub. I won't introduce it here. Interested view git address: https://github.com/alibaba/canal

2. Canal version

Version 1.1.4 of canal ushers in the most important WebUI capability. The canal admin project is introduced to support the dynamic management capability of canal for WebUI and online white screen operation and maintenance capabilities such as configuration, task and log. The specific document is the Canal Admin Guide.

2, Canal sync to ES

1. Server planning

Prepare 2 servers for local test

The serverDeployed services
a-lf-bigdatamysql,canal-server
b-lf-bigdatamysql,canal-admin,canal-adapter,es,kibana

2. Install mysql

A. Install mysql on two machines respectively

(1) Install the yum repository for MySQL

yum -y localinstall https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm

(2) Install MySQL

yum -y install mysql-community-server

(3) Set to boot

systemctl enable mysqld

(4) Start MySQL

systemctl start mysqld

(5) View MySQL status

systemctl status mysqld

(6) View root temporary password

grep 'temporary password' /var/log/mysqld.log

(7) Change root password

mysql -uroot -p
ALTER USER 'root'@'localhost' IDENTIFIED BY 'Root_12root';
SHOW VARIABLES LIKE 'validate_password%';
set global validate_password.policy=0;
set global validate_password.length=1;
ALTER USER 'root'@'localhost' IDENTIFIED BY 'root%123';
exit

B. MySQL create user

1. Create a user to collect data in MySQL of a-lf-bigdata
mysql -uroot -p
set global validate_password.policy=0;
set global validate_password.length=1;
CREATE USER canal IDENTIFIED BY '2wsxVFR_';
-- GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;
exit

Enable Binlog format in MySQL of a-lf-bigdata

vi /etc/my.cnf

Add the following configuration

server-id=1
log-bin=mysql-bin
binlog-format=ROW
binlog-ignore-db=information_schema
binlog-ignore-db=mysql
binlog-ignore-db=performance_schema
binlog-ignore-db=sys
  • Log bin is used to specify the prefix of binlog log file name, which is stored in / var/lib/mysql directory by default.

  • Server ID is used to identify a unique database and cannot be repeated with other servers. It is recommended to use the last paragraph of ip, and the default value is not allowed.

  • Binlog ignore DB: indicates the database ignored during synchronization.

  • Binlog do DB: specify the database to be synchronized (if this item is not available, it means that all databases are synchronized)

Log in to mysql to view:

mysql -uroot -p
show master status;

As shown in the figure, binlog is successfully opened.

Disable explicit_defaults_for_timestamp

mysql -uroot -p
SHOW VARIABLES LIKE '%explicit_defaults_for_timestamp%';
set persist explicit_defaults_for_timestamp=0;
SHOW VARIABLES LIKE '%explicit_defaults_for_timestamp%';

Restart MySQL

systemctl status mysqld
2. Create users in MySQL of b-lf-bigdata
mysql -uroot -p
set global validate_password.policy=0;
set global validate_password.length=1;
CREATE USER canaladmin IDENTIFIED BY '2wsxVFR_';
GRANT ALL ON canal_manager.* TO 'canaladmin'@'%';
FLUSH PRIVILEGES;
exit

3. Install ES

1. Download installation package

Download ES:

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.0-linux-x86_64.tar.gz
curl -L -O https://artifacts.elastic.co/downloads/kibana/kibana-7.4.0-linux-x86_64.tar.gz

2. Install ES

(1) Create a normal user for the server
useradd hadoop
passwd hadoop

Then give sudo permission

(2) Install ES
  1. Decompress ES
cd /data/liufei/es
tar -zxf elasticsearch-7.4.0-linux-x86_64.tar.gz
ln -s elasticsearch-7.4.0 elasticsearch
  1. Configure environment variables
# Edit profile
vi /etc/profile
# New ES configuration
#elasticsearch
export ES_HOME=/data/liufei/es/elasticsearch
export PATH=${ES_HOME}/bin:$PATH
# Yes, the configuration takes effect
source /etc/profile
  1. Basic configuration

Adjust maximum virtual memory

# Edit profile
vi /etc/sysctl.conf
# Add configuration
vm.max_map_count=262144

Execute the command after saving and exiting to make the configuration effective

sysctl -p
  1. Modify permissions

Give the owner of the ES directory to the hadoop user

chown -R hadoop:hadoop elasticsearch
chown -R hadoop:hadoop elasticsearch-7.4.0

Switch to hadoop user

su hadoop
  1. Configure ES
# Edit profile
vi $ES_HOME/config/elasticsearch.yml

# Modify configuration
network.host: 0.0.0.0
discovery.seed_hosts: ["b-lf-bigdata"]
  1. Start ES
$ES_HOME/bin/elasticsearch -d
  1. Confirm that ES is started successfully
curl http://b-lf-bigdata:9200
(2) Install kibana
  1. Unzip kibana
cd /data/liufei/es
tar xzvf kibana-7.4.0-linux-x86_64.tar.gz
ln -s kibana-7.4.0-linux-x86_64 kibana
  1. Configure environment variables
# Edit profile
vi /etc/profile
# New ES configuration
#kibana
export KIBANA_HOME=/data/liufei/es/kibana
export PATH=${KIBANA_HOME}/bin:$PATH
# Yes, the configuration takes effect
source /etc/profile
  1. Configure Kibana
# Edit profile
vi $KIBANA_HOME/config/kibana.yml

# Modify configuration
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://b-lf-bigdata:9200"]
  1. Start Kibana
nohup $KIBANA_HOME/bin/kibana > $KIBANA_HOME/kibana.out 2>&1 &
  1. View Kibana status

http://b-lf-bigdata:5601

  1. New index
put /test_test
{
    "mappings": {
        "properties": {
            "name": {
                "type": "text"
						}, 
          	"age": {
                "type": "integer"
            },
            "modified_time": {
                "type": "date"
						} 
        }
		} 
}

4. Install Canal

(1) Download Canal

Canal Download address: https://github.com/alibaba/canal/releases
wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.admin-1.1.5.tar.gz
wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz
wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.adapter-1.1.5.tar.gz

(2) Install canal admin

  1. decompression
tar -zxf canal.admin-1.1.5.tar.gz
cd canal-admin

The directory structure is as follows

  1. Configure environment variables
# Edit profile
vi /etc/profile

# Add configuration
#canal-admin
export CANAL_ADMIN_HOME=/data/liufei/cacal/canal-admin
export PATH=${CANAL_ADMIN_HOME}/bin:$PATH

# Make environment variables effective
source /etc/profile
  1. Modify configuration
vi $CANAL_ADMIN_HOME/conf/application.yml

# Modify the configuration as follows
server:
  port: 8089
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8
spring.datasource:
  address: b-lf-bigdata:3306
  database: canal_manager
  username: canaladmin
  password: 2wsxVFR_
  driver-class-name: com.mysql.jdbc.Driver
  url: jdbc:mysql://${spring.datasource.address}/${spring.datasource.database}?
useUnicode=true&characterEncoding=UTF-
8&useSSL=false&allowPublicKeyRetrieval=true
  hikari:
    maximum-pool-size: 30
    minimum-idle: 1
#This refers to the username and password of dual line communication between canal server and canal admin
  adminUser: admin
  adminPasswd: 123456
  1. Replace MySQL driver package

Because we use MySQL 5.7, we need to use 8 drivers

  1. Initialize database
mysql -uroot -p
source /data/liufei/canal/canal-admin/conf/canal_manager.sql
  1. Start canal admin
sh $CANAL_ADMIN_HOME/bin/startup.sh

View log:

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-lomrs253-163108115831) (/ users / Juzi / library / Application Support / typera user images / image-20210908132642239. PNG)]

  1. Canal admin started successfully, access address:

http://b-lf-bigdata:8089

(3) Install canal server

  1. decompression
tar -zxvf canal.deployer-1.1.5.tar.gz
cd canal-server
  1. Configure environment variables
vi /etc/profile

# New configuration
#canal-server
export CANAL_SERVER_HOME=/home/hadoop/app/canal-server
export PATH=${CANAL_SERVER_HOME}/bin:$PATH

# Make environment variables effective
source /etc/profile
  1. Modify configuration

vi canal.properties

# register ip
canal.register.ip = b-lf-bigdata

# canal admin config
canal.admin.manager = b-lf-bigdata:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9
# admin auto register
canal.admin.register.auto = true
canal.admin.register.cluster =
canal.admin.register.name =
  1. Replace MySQL driver package
  2. start-up
$CANAL_SERVER_HOME/bin/startup.sh
  1. view log

(4) Configure Instance

  1. Login to canal_admin
  2. Enter the instance management page to create a new instance

  1. Load template

  1. Modify template
canal.instance.master.address=a-lf-bigdata:3306
canal.instance.dbUsername=canal
canal.instance.dbPassword=2wsxVFR_
  1. Start instance and view the log

(5) Create debug database

Operate on a-lf-bigdata

mysql -uroot -p

# Create data
CREATE DATABASE IF NOT EXISTS test DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
use test;
DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
  `uid` INT UNSIGNED AUTO_INCREMENT,
  `name` VARCHAR(100) NOT NULL,
  `age` int(3) DEFAULT NULL,
  `modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
CURRENT_TIMESTAMP,
  PRIMARY KEY (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

(6) Install ClientAdapter

  1. decompression
tar -zxf canal.adapter-1.1.5.tar.gz
cd canal-adapter
  1. Configure environment variables
vi /etc/profile

# New configuration
#canal-adapter
export CANAL_ADAPTER_HOME=/data/liufei/cacal/canal-adapter
export PATH=${CANAL_ADAPTER_HOME}/bin:$PATH

# Make environment variables effective
source /etc/profile
  1. Modify service configuration
vi $CANAL_ADAPTER_HOME/conf/application.yml

server:
  port: 8081
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8
    default-property-inclusion: non_null

canal.conf:
  mode: tcp #tcp kafka rocketMQ rabbitMQ
  flatMessage: true
  zookeeperHosts:
  syncBatchSize: 1000
  retries: 0
  timeout:
  accessKey:
  secretKey:
  consumerProperties:
    # canal tcp consumer
    canal.tcp.server.host: a-lf-bigdata:11111
    canal.tcp.zookeeper.hosts:
    canal.tcp.batch.size: 500
    canal.tcp.username:
    canal.tcp.password:

  srcDataSources:
    defaultDS:
      url: jdbc:mysql://a-lf-bigdata:3306/test?useUnicode=true
      username: canal
      password: 2wsxVFR_
  canalAdapters:
  - instance: test_to_es # canal instance Name or mq topic name
    groups:
    - groupId: g1
      outerAdapters:
      - name: logger
#      - name: rdb
#        key: mysql1
#        properties:
#          jdbc.driverClassName: com.mysql.jdbc.Driver
#          jdbc.url: jdbc:mysql://127.0.0.1:3306/mytest2?useUnicode=true
#          jdbc.username: root
#          jdbc.password: 121212
#      - name: rdb
#        key: oracle1
#        properties:
#          jdbc.driverClassName: oracle.jdbc.OracleDriver
#          jdbc.url: jdbc:oracle:thin:@localhost:49161:XE
#          jdbc.username: mytest
#          jdbc.password: m121212
#      - name: rdb
#        key: postgres1
#        properties:
#          jdbc.driverClassName: org.postgresql.Driver
#          jdbc.url: jdbc:postgresql://localhost:5432/postgres
#          jdbc.username: postgres
#          jdbc.password: 121212
#          threads: 1
#          commitSize: 3000
#      - name: hbase
#        properties:
#          hbase.zookeeper.quorum: 127.0.0.1
#          hbase.zookeeper.property.clientPort: 2181
#          zookeeper.znode.parent: /hbase
      - name: es6
        hosts: b-lf-bigdata:9300 # 127.0.0.1:9200 for rest mode
        properties:
          mode: transport # or rest
#          # security.auth: test:123456 #  only used for rest mode
          cluster.name: es
#        - name: kudu
#          key: kudu
#          properties:
#            kudu.master.address: 127.0.0.1 # ',' split multi address

Modify the configuration under the es6 directory and create a new test_to_es.yml

dataSourceKey: defaultDS
destination: test_to_es
groupId: g1
esMapping:
  _index: test_test
  _type: _doc
  _id: _id
  upsert: true
  sql: "select a.uid as _id, a.name, a.age, a.modified_time from test a"
  commitBatch: 2
  1. Replace MySQL driver package
  2. Start adapter
$CANAL_ADAPTER_HOME/bin/startup.sh
$CANAL_ADAPTER_HOME/bin/stop.sh
$CANAL_ADAPTER_HOME/bin/restart.sh

(7) Testing

  1. New data
INSERT INTO test.test (name, age) VALUES ("Zhang San",20); 
INSERT INTO test.test (name, age) VALUES ("Li Si",21); 
INSERT INTO test.test (name, age) VALUES ("Wang Wu",22); 
INSERT INTO test.test (name, age) VALUES ("Zhao Liu",23); 
INSERT INTO test.test (name, age) VALUES ("Ma Qi",24);
  1. View es

Tags: MySQL canal

Posted on Tue, 23 Nov 2021 22:49:25 -0500 by cornelombaard