1, Canal overview
1. What is Canal?
Canal is an open source component of Alibaba. Its main purpose is to provide incremental data subscription and consumption based on MySQL database incremental log parsing. The introduction of canal is very good in the official documents on GitHub. I won't introduce it here. Interested view git address: https://github.com/alibaba/canal
2. Canal version
Version 1.1.4 of canal ushers in the most important WebUI capability. The canal admin project is introduced to support the dynamic management capability of canal for WebUI and online white screen operation and maintenance capabilities such as configuration, task and log. The specific document is the Canal Admin Guide.
2, Canal sync to ES
1. Server planning
Prepare 2 servers for local test
The server | Deployed services |
---|---|
a-lf-bigdata | mysql,canal-server |
b-lf-bigdata | mysql,canal-admin,canal-adapter,es,kibana |
2. Install mysql
A. Install mysql on two machines respectively
(1) Install the yum repository for MySQL
yum -y localinstall https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
(2) Install MySQL
yum -y install mysql-community-server
(3) Set to boot
systemctl enable mysqld
(4) Start MySQL
systemctl start mysqld
(5) View MySQL status
systemctl status mysqld
(6) View root temporary password
grep 'temporary password' /var/log/mysqld.log
(7) Change root password
mysql -uroot -p ALTER USER 'root'@'localhost' IDENTIFIED BY 'Root_12root'; SHOW VARIABLES LIKE 'validate_password%'; set global validate_password.policy=0; set global validate_password.length=1; ALTER USER 'root'@'localhost' IDENTIFIED BY 'root%123'; exit
B. MySQL create user
1. Create a user to collect data in MySQL of a-lf-bigdata
mysql -uroot -p set global validate_password.policy=0; set global validate_password.length=1; CREATE USER canal IDENTIFIED BY '2wsxVFR_'; -- GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%'; GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ; FLUSH PRIVILEGES; exit
Enable Binlog format in MySQL of a-lf-bigdata
vi /etc/my.cnf
Add the following configuration
server-id=1 log-bin=mysql-bin binlog-format=ROW binlog-ignore-db=information_schema binlog-ignore-db=mysql binlog-ignore-db=performance_schema binlog-ignore-db=sys
-
Log bin is used to specify the prefix of binlog log file name, which is stored in / var/lib/mysql directory by default.
-
Server ID is used to identify a unique database and cannot be repeated with other servers. It is recommended to use the last paragraph of ip, and the default value is not allowed.
-
Binlog ignore DB: indicates the database ignored during synchronization.
-
Binlog do DB: specify the database to be synchronized (if this item is not available, it means that all databases are synchronized)
Log in to mysql to view:
mysql -uroot -p show master status;
As shown in the figure, binlog is successfully opened.
Disable explicit_defaults_for_timestamp
mysql -uroot -p SHOW VARIABLES LIKE '%explicit_defaults_for_timestamp%'; set persist explicit_defaults_for_timestamp=0; SHOW VARIABLES LIKE '%explicit_defaults_for_timestamp%';
Restart MySQL
systemctl status mysqld
2. Create users in MySQL of b-lf-bigdata
mysql -uroot -p set global validate_password.policy=0; set global validate_password.length=1; CREATE USER canaladmin IDENTIFIED BY '2wsxVFR_'; GRANT ALL ON canal_manager.* TO 'canaladmin'@'%'; FLUSH PRIVILEGES; exit
3. Install ES
1. Download installation package
Download ES:
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.0-linux-x86_64.tar.gz curl -L -O https://artifacts.elastic.co/downloads/kibana/kibana-7.4.0-linux-x86_64.tar.gz
2. Install ES
(1) Create a normal user for the server
useradd hadoop passwd hadoop
Then give sudo permission
(2) Install ES
- Decompress ES
cd /data/liufei/es tar -zxf elasticsearch-7.4.0-linux-x86_64.tar.gz ln -s elasticsearch-7.4.0 elasticsearch
- Configure environment variables
# Edit profile vi /etc/profile # New ES configuration #elasticsearch export ES_HOME=/data/liufei/es/elasticsearch export PATH=${ES_HOME}/bin:$PATH # Yes, the configuration takes effect source /etc/profile
- Basic configuration
Adjust maximum virtual memory
# Edit profile vi /etc/sysctl.conf # Add configuration vm.max_map_count=262144
Execute the command after saving and exiting to make the configuration effective
sysctl -p
- Modify permissions
Give the owner of the ES directory to the hadoop user
chown -R hadoop:hadoop elasticsearch chown -R hadoop:hadoop elasticsearch-7.4.0
Switch to hadoop user
su hadoop
- Configure ES
# Edit profile vi $ES_HOME/config/elasticsearch.yml # Modify configuration network.host: 0.0.0.0 discovery.seed_hosts: ["b-lf-bigdata"]
- Start ES
$ES_HOME/bin/elasticsearch -d
- Confirm that ES is started successfully
curl http://b-lf-bigdata:9200
(2) Install kibana
- Unzip kibana
cd /data/liufei/es tar xzvf kibana-7.4.0-linux-x86_64.tar.gz ln -s kibana-7.4.0-linux-x86_64 kibana
- Configure environment variables
# Edit profile vi /etc/profile # New ES configuration #kibana export KIBANA_HOME=/data/liufei/es/kibana export PATH=${KIBANA_HOME}/bin:$PATH # Yes, the configuration takes effect source /etc/profile
- Configure Kibana
# Edit profile vi $KIBANA_HOME/config/kibana.yml # Modify configuration server.host: "0.0.0.0" elasticsearch.hosts: ["http://b-lf-bigdata:9200"]
- Start Kibana
nohup $KIBANA_HOME/bin/kibana > $KIBANA_HOME/kibana.out 2>&1 &
- View Kibana status
http://b-lf-bigdata:5601
- New index
put /test_test { "mappings": { "properties": { "name": { "type": "text" }, "age": { "type": "integer" }, "modified_time": { "type": "date" } } } }
4. Install Canal
(1) Download Canal
Canal Download address: https://github.com/alibaba/canal/releases wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.admin-1.1.5.tar.gz wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.adapter-1.1.5.tar.gz
(2) Install canal admin
- decompression
tar -zxf canal.admin-1.1.5.tar.gz cd canal-admin
The directory structure is as follows
- Configure environment variables
# Edit profile vi /etc/profile # Add configuration #canal-admin export CANAL_ADMIN_HOME=/data/liufei/cacal/canal-admin export PATH=${CANAL_ADMIN_HOME}/bin:$PATH # Make environment variables effective source /etc/profile
- Modify configuration
vi $CANAL_ADMIN_HOME/conf/application.yml # Modify the configuration as follows server: port: 8089 spring: jackson: date-format: yyyy-MM-dd HH:mm:ss time-zone: GMT+8 spring.datasource: address: b-lf-bigdata:3306 database: canal_manager username: canaladmin password: 2wsxVFR_ driver-class-name: com.mysql.jdbc.Driver url: jdbc:mysql://${spring.datasource.address}/${spring.datasource.database}? useUnicode=true&characterEncoding=UTF- 8&useSSL=false&allowPublicKeyRetrieval=true hikari: maximum-pool-size: 30 minimum-idle: 1 #This refers to the username and password of dual line communication between canal server and canal admin adminUser: admin adminPasswd: 123456
- Replace MySQL driver package
Because we use MySQL 5.7, we need to use 8 drivers
- Initialize database
mysql -uroot -p source /data/liufei/canal/canal-admin/conf/canal_manager.sql
- Start canal admin
sh $CANAL_ADMIN_HOME/bin/startup.sh
View log:
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-lomrs253-163108115831) (/ users / Juzi / library / Application Support / typera user images / image-20210908132642239. PNG)]
- Canal admin started successfully, access address:
http://b-lf-bigdata:8089
(3) Install canal server
- decompression
tar -zxvf canal.deployer-1.1.5.tar.gz cd canal-server
- Configure environment variables
vi /etc/profile # New configuration #canal-server export CANAL_SERVER_HOME=/home/hadoop/app/canal-server export PATH=${CANAL_SERVER_HOME}/bin:$PATH # Make environment variables effective source /etc/profile
- Modify configuration
vi canal.properties # register ip canal.register.ip = b-lf-bigdata # canal admin config canal.admin.manager = b-lf-bigdata:8089 canal.admin.port = 11110 canal.admin.user = admin canal.admin.passwd = 6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 # admin auto register canal.admin.register.auto = true canal.admin.register.cluster = canal.admin.register.name =
- Replace MySQL driver package
- start-up
$CANAL_SERVER_HOME/bin/startup.sh
- view log
(4) Configure Instance
- Login to canal_admin
- Enter the instance management page to create a new instance
- Load template
- Modify template
canal.instance.master.address=a-lf-bigdata:3306 canal.instance.dbUsername=canal canal.instance.dbPassword=2wsxVFR_
- Start instance and view the log
(5) Create debug database
Operate on a-lf-bigdata
mysql -uroot -p # Create data CREATE DATABASE IF NOT EXISTS test DEFAULT CHARSET utf8 COLLATE utf8_general_ci; use test; DROP TABLE IF EXISTS `test`; CREATE TABLE `test` ( `uid` INT UNSIGNED AUTO_INCREMENT, `name` VARCHAR(100) NOT NULL, `age` int(3) DEFAULT NULL, `modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`uid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
(6) Install ClientAdapter
- decompression
tar -zxf canal.adapter-1.1.5.tar.gz cd canal-adapter
- Configure environment variables
vi /etc/profile # New configuration #canal-adapter export CANAL_ADAPTER_HOME=/data/liufei/cacal/canal-adapter export PATH=${CANAL_ADAPTER_HOME}/bin:$PATH # Make environment variables effective source /etc/profile
- Modify service configuration
vi $CANAL_ADAPTER_HOME/conf/application.yml server: port: 8081 spring: jackson: date-format: yyyy-MM-dd HH:mm:ss time-zone: GMT+8 default-property-inclusion: non_null canal.conf: mode: tcp #tcp kafka rocketMQ rabbitMQ flatMessage: true zookeeperHosts: syncBatchSize: 1000 retries: 0 timeout: accessKey: secretKey: consumerProperties: # canal tcp consumer canal.tcp.server.host: a-lf-bigdata:11111 canal.tcp.zookeeper.hosts: canal.tcp.batch.size: 500 canal.tcp.username: canal.tcp.password: srcDataSources: defaultDS: url: jdbc:mysql://a-lf-bigdata:3306/test?useUnicode=true username: canal password: 2wsxVFR_ canalAdapters: - instance: test_to_es # canal instance Name or mq topic name groups: - groupId: g1 outerAdapters: - name: logger # - name: rdb # key: mysql1 # properties: # jdbc.driverClassName: com.mysql.jdbc.Driver # jdbc.url: jdbc:mysql://127.0.0.1:3306/mytest2?useUnicode=true # jdbc.username: root # jdbc.password: 121212 # - name: rdb # key: oracle1 # properties: # jdbc.driverClassName: oracle.jdbc.OracleDriver # jdbc.url: jdbc:oracle:thin:@localhost:49161:XE # jdbc.username: mytest # jdbc.password: m121212 # - name: rdb # key: postgres1 # properties: # jdbc.driverClassName: org.postgresql.Driver # jdbc.url: jdbc:postgresql://localhost:5432/postgres # jdbc.username: postgres # jdbc.password: 121212 # threads: 1 # commitSize: 3000 # - name: hbase # properties: # hbase.zookeeper.quorum: 127.0.0.1 # hbase.zookeeper.property.clientPort: 2181 # zookeeper.znode.parent: /hbase - name: es6 hosts: b-lf-bigdata:9300 # 127.0.0.1:9200 for rest mode properties: mode: transport # or rest # # security.auth: test:123456 # only used for rest mode cluster.name: es # - name: kudu # key: kudu # properties: # kudu.master.address: 127.0.0.1 # ',' split multi address
Modify the configuration under the es6 directory and create a new test_to_es.yml
dataSourceKey: defaultDS destination: test_to_es groupId: g1 esMapping: _index: test_test _type: _doc _id: _id upsert: true sql: "select a.uid as _id, a.name, a.age, a.modified_time from test a" commitBatch: 2
- Replace MySQL driver package
- Start adapter
$CANAL_ADAPTER_HOME/bin/startup.sh $CANAL_ADAPTER_HOME/bin/stop.sh $CANAL_ADAPTER_HOME/bin/restart.sh
(7) Testing
- New data
INSERT INTO test.test (name, age) VALUES ("Zhang San",20); INSERT INTO test.test (name, age) VALUES ("Li Si",21); INSERT INTO test.test (name, age) VALUES ("Wang Wu",22); INSERT INTO test.test (name, age) VALUES ("Zhao Liu",23); INSERT INTO test.test (name, age) VALUES ("Ma Qi",24);
- View es