Go Mysql transfer of Mysql data real-time incremental synchronization tool

@

Go MySQL transfer of data real-time incremental synchronization tool: https://blog.csdn.net/weixin_...

Installation, configuration and Kibana foundation of Elasticsearch notes: https://blog.csdn.net/weixin_...

Go MySQL transfer official Manual: https://www.kancloud.cn/wj596...

Environment installation of GO notes: https://blog.csdn.net/weixin_...

Technology selection: mysql8 + go MySQL transfer + elasticsearch7.13

brief introduction

GO MySQL transfer is a real-time incremental synchronization tool for MySQL database. GO environment required

It can monitor the changes of MySQL binary log, form the changed content into a message in the specified format, and send it to the receiver in real time. Thus, a high-performance and low latency incremental data synchronous update pipeline is formed between the database and the receiving end.

The work needs to study the MySQL Binlog incremental subscription consumer component canal, which is open source by Alibaba. It has powerful functions and stable operation, but some aspects do not meet the requirements, mainly including the following three points:

1. You need to write your own client to consume the data parsed by canal

2. In the server client mode, both server and client components need to be deployed at the same time. Six business databases in our project need to be synchronized to redis in real time, which means that 12 more components need to be deployed, and the hardware and operation and maintenance costs will increase.

3. From the server side to the client side, you need to go through a network transmission, serialization and deserialization operation, and then synchronize to the receiving end. It feels that it is more efficient not to connect directly to the receiving end.

prerequisite

  • The MySQL server needs to enable binlog in row mode.
  • Because you want to use the mysqldump command, the server where the process is located needs to deploy this tool.
  • This tool is developed using GoLang and needs to be built in the Go 1.9 + environment.
  • The local es of the new version (7.13 +) must turn off the safe mode

    yml profile add

    xpack.security.enabled: false
  • Available MySQL, Elasticsearch, and Kibana instances. Permissions need to be larger.

    • mysql binlog must be in ROW mode
    • The mysql data table to be synchronized must contain a primary key, otherwise it will be ignored. This is because if the data table does not have a primary key, the UPDATE and DELETE operations will not be synchronized because the corresponding document cannot be found in the ES
    • Modification of table structure during program operation is not supported
    • To grant the RELOAD permission and REPLICATION permission to the account used to connect to mysql, the SUPER permission
    GRANT REPLICATION SLAVE ON *.* TO 'elastic'@'IP';
    GRANT RELOAD ON *.* TO 'elastic'@'IP';
    UPDATE mysql.user SET Super_Priv='Y' WHERE user='elastic' AND host='IP';

characteristic

1. Simple, independent of other components, one click deployment
2. Integrate multiple receivers, such as Redis, MongoDB, Elasticsearch, RocketMQ, Kafka, RabbitMQ, HTTP API, etc. there is no need to write a client and use it out of the box
3. Built in rich data parsing and message generation rules, and support template syntax
4. It supports Lua script extension and can handle complex logic, such as data conversion, cleaning and widening
5. Integrated Prometheus client to support monitoring and alarm
6. Integrated Web Admin monitoring page
7. Support high availability cluster deployment
8. Data synchronization failed. Try again
9. Support full data initialization

Comparison with similar tools

characteristicCanalmysql_streamgo-mysql-transferMaxwell
development language JavaPythonGolangJava
High availabilitysupportsupportsupportsupport
receiving endCoding customizationKafka et al. (MQ)Redis, MongoDB, Elasticsearch, RabbitMQ, Kafka, RocketMQ, HTTP API, etcKafka, Kinesis, RabbitMQ, Redis, Google Cloud Pub/Sub, files, etc
Full data initializationI won't support itsupportsupportsupport
data formatCoding customizationJason (fixed format)Json (rule configuration) template syntax Lua scriptJSON
Performance (4-8TPS)

Implementation principle

1. Go MySQL transfer disguises itself as a Slave of MySQL,

2. Send dump protocol to Master to obtain binlog, parse binlog and generate message

3. Send the generated messages to the receiver in real time and batch

As shown in the figure below:

Go MySQL deployment run

Open MySQL binlog

Modify app.yml

Command line run
Run go-mysql-transfer.exe directly from Windows
Execute nohup go MySQL transfer on Linux&

monitor

Go MySQL transfer supports two monitoring modes, Prometheus and the built-in Web Admin

Related configuration:

# web admin related configuration
enable_web_admin: true #Whether to enable web admin. The default is false
web_admin_port: 8060 #web monitoring port, 8060 by default

Directly visit 127.0.0.1:8060 to see the monitoring interface synchronize data to Elasticsearch

Synchronize data to Elasticsearch

Configuration file - related configurations are as follows:

# app.yml
#Target type
target: elasticsearch 

#elasticsearch connection configuration
es_addrs: 127.0.0.1:9200 #Connection addresses, multiple separated by commas
es_version: 7 # Elasticsearch version supports 6 and 7. The default is 7
#es_password:  # user name
#es_version:  # password

At present, Elasticsearch6 and Elasticsearch7 are supported

Rule based synchronization

The relevant configurations are as follows:

rule:
  -
    schema: eseap #Database name
    table: t_user #Table name
    #order_by_column: id #Sorting field, which cannot be empty during stock data synchronization
    #column_lower_case: true #The column name is converted to lowercase, and the default is false
    #column_upper_case:false#The column name is capitalized, and the default is false
    column_underscore_to_camel: true #The column name is underlined to hump, and the default is false
    # The included columns are separated by multi value commas, such as ID, name, age, area_. when ID is empty, it means that all columns are included
    #include_columns: ID,USER_NAME,PASSWORD
    #exclude_columns: BIRTHDAY,MOBIE # The excluded columns are separated by multi value commas, such as ID, name, age, area_, and ID, which is empty by default
    #default_column_values: area_name=Hefei  #The default column value. Multiple columns are separated by commas, such as source=binlog,area_name = binlog
    #date_formatter: yyyy-MM-dd #The date type is formatted, and the default yyyy MM DD is not filled in
    #datetime_formatter: yyyy-MM-dd HH:mm:ss #datetime and timestamp types are formatted, and the default yyyy MM DD HH: mm: SS is not filled in

    #Elasticsearch related
    es_index: user_index #Index name, which can be blank. The table name is used by default
    #es_mappings: #Index mapping, which can be empty. If it is empty, it can be derived according to the data type
    #  -  
    #      column: REMARK #Database column name
    #    field: remark #Mapped ES field name
    #    type: text #ES field type
    #    analyzer: ik_smart #ES word breaker, type text, this item is meaningful
    #    #format: #Date format, type is date, which is meaningful
    #   -  
    #    column: USER_NAME #Database column name
    #    field: account #Mapped ES field name
    #    type: keyword #ES field type

Rule example

t_user table, the data are as follows:

Example 1

Use the above configuration
The automatically created Mapping is as follows:

The data synchronized to Elasticsearch is as follows:

Example 2

The configuration is as follows:

rule:
  -
    schema: eseap #Database name
    table: t_user #Table name
    order_by_column: id #Sorting field, which cannot be empty during stock data synchronization
    column_lower_case: true #The column name is converted to lowercase, and the default is false
    #column_upper_case:false#The column name is capitalized, and the default is false
    #column_underscore_to_camel: true #The column name is underlined to hump, and the default is false
    # The included columns are separated by multi value commas, such as ID, name, age, area_. when ID is empty, it means that all columns are included
    #include_columns: ID,USER_NAME,PASSWORD
    #exclude_columns: BIRTHDAY,MOBIE # The excluded columns are separated by multi value commas, such as ID, name, age, area_, and ID, which is empty by default
    default_column_values: area_name=Hefei  #The default column value. Multiple columns are separated by commas, such as source=binlog,area_name = binlog
    #date_formatter: yyyy-MM-dd #The date type is formatted, and the default yyyy MM DD is not filled in
    #datetime_formatter: yyyy-MM-dd HH:mm:ss #datetime and timestamp types are formatted, and the default yyyy MM DD HH: mm: SS is not filled in

    #Elasticsearch related
    es_index: user_index #Index name, which can be blank. The table name is used by default
    es_mappings: #Index mapping, which can be empty. If it is empty, it can be derived according to the data type
      -  
        column: REMARK #Database column name
        field: remark #Mapped ES field name
        type: text #ES field type
        analyzer: ik_smart #ES word breaker, type text, this item is meaningful
        #format: #Date format, type is date, which is meaningful
      -  
        column: USER_NAME #Database column name
        field: account #Mapped ES field name
        type: keyword #ES field type

es_ The mappings configuration item indicates the mappings (mapping relationship) defining the index, and ES is not defined_ Mappings uses the column type to automatically create mappings for the index.

The Mapping created is as follows:

The data synchronized to Elasticsearch is as follows:

Lua based script synchronization

More complex data processing logic can be realized by using Lua script. Go MySQL transfer supports Lua5.1 syntax

Lua example

t_user table, the data are as follows:

Example 1

Introduce Lua script:

rule:
  -
    schema: eseap #Database name
    table: t_user #Table name
    order_by_column: id #Sorting field, which cannot be empty during stock data synchronization
    lua_file_path: lua/t_user_es.lua   #lua script file
    es_index: user_index #Elasticsearch Index name, which can be empty. The table name is used by default
    es_mappings: #Index mapping, which can be empty. If it is empty, it can be derived according to the data type
      -  
        field: id #Mapped ES field name
        type: keyword #ES field type
      -  
        field: userName #Mapped ES field name
        type: keyword #ES field type
      -  
        field: password #Mapped ES field name
        type: keyword #ES field type
      -  
        field: createTime #Mapped ES field name
        type: date #ES field type
        format: yyyy-MM-dd HH:mm:ss #Date format, type is date, which is meaningful
      -  
        field: remark #Mapped ES field name
        type: text #ES field type
        analyzer: ik_smart #ES word breaker, type text, this item is meaningful
      -  
        field: source #Mapped ES field name
        type: keyword #ES field type

Among them,
es_mappings represents the mappings (mapping relationship) of the index. Es is not defined_ Mappings automatically creates mappings based on the value of the field. According to es_ The mappings generated by mappings are as follows:

Lua script:

local ops = require("esOps") --load elasticsearch Operation module

local row = ops.rawRow()  --A row of data in the current database,table Type, key Is the column name
local action = ops.rawAction()  --Current database events,include: insert,update,delete

local id = row["ID"] --obtain ID The value of the column
local userName = row["USER_NAME"] --obtain USER_NAME The value of the column
local password = row["PASSWORD"] --obtain USER_NAME The value of the column
local createTime = row["CREATE_TIME"] --obtain CREATE_TIME The value of the column
local remark = row["REMARK"] --obtain REMARK The value of the column

local result = {}  -- Define a table,As a result set
result["id"] = id
result["userName"] = userName
result["password"] = password
result["createTime"] = createTime
result["remark"] = remark
result["source"] = "binlog" -- data sources 

if action == "insert" then -- Listen only for new events
    ops.INSERT("t_user",id,result) -- New, parameter 1 is index name, string Type; Parameter 2 is the primary key of the data to be inserted; Parameter 3 is the data to be inserted, tablele Type or json character string
end 

The data synchronized to Elasticsearch is as follows:

Example 2

Introduce Lua script:

    schema: eseap #Database name
    table: t_user #Table name
    lua_file_path: lua/t_user_es2.lua   #lua script file

If the index name and mappings are not clearly defined, es will automatically create an index named t according to the value_ User's index.

Use the following script:

local ops = require("esOps") --load elasticsearch Operation module

local row = ops.rawRow()  --A row of data in the current database,table Type, key Is the column name
local action = ops.rawAction()  --Current database events,include: insert,update,delete

local id = row["ID"] --obtain ID The value of the column
local userName = row["USER_NAME"] --obtain USER_NAME The value of the column
local password = row["PASSWORD"] --obtain USER_NAME The value of the column
local createTime = row["CREATE_TIME"] --obtain CREATE_TIME The value of the column
local result = {}  -- Define a table,As a result set
result["id"] = id
result["userName"] = userName
result["password"] = password
result["createTime"] = createTime
result["remark"] = remark
result["source"] = "binlog" -- data sources 

if action == "insert" then -- Listen only for new events
    ops.INSERT("t_user",id,result) -- New, parameter 1 is index name, string Type; Parameter 2 is the primary key of the data to be inserted; Parameter 3 is the data to be inserted, tablele Type or json character string
end 

The data synchronized to Elasticsearch is as follows:

esOps module

The methods provided are as follows:

  1. INSERT: INSERT operation, such as ops.INSERT(index,id,result). The parameter index is the index name and the string type; The parameter index is the primary key of the data to be inserted; The parameter result is the data to be inserted, which can be of type table or json string
  2. UPDATE: modify operations, such as ops.UPDATE(index,id,result). The parameter index is the index name and the string type; The parameter index is the primary key of the data to be modified; The parameter result is the data to be modified, which can be of type table or json string
  3. DELETE: DELETE, such as ops.DELETE(index,id). The parameter index is the index name and the string type; The parameter id is the primary key of the data to be deleted, and the type is not limited;

Synchronize data to RocketMQ

RocketMQ configuration

The relevant configurations are as follows:

# app.yml

target: rocketmq #Target type
#rocketmq connection configuration
rocketmq_name_servers: 127.0.0.1:9876 #rocketmq naming service addresses, multiple of which are separated by commas
#rocketmq_group_name: transfer_test_group #rocketmq group name, which is empty by default
#rocketmq_instance_name: transfer_test_group_ins #rocketmq instance name, which is empty by default
#rocketmq_access_key: RocketMQ #Access control accessKey. It is empty by default
#rocketmq_secret_key: 12345678 #Access control secretKey, which is empty by default

Rule based synchronization

The relevant configurations are as follows:

rule:
  -
    schema: eseap #Database name
    table: t_user #Table name
    #order_by_column: id #Sorting field, which cannot be empty during stock data synchronization
    #column_lower_case:false #The column name is converted to lowercase, and the default is false
    #column_upper_case:false#The column name is capitalized, and the default is false
    column_underscore_to_camel: true #The column name is underlined to hump, and the default is false
    # Include columns with multiple values separated by commas, such as ID, name, age and area_ When the ID is empty, all columns are included
    #include_columns: ID,USER_NAME,PASSWORD
    #exclude_columns: BIRTHDAY,MOBIE # The excluded columns are separated by multi value commas, such as ID, name, age and area_ ID is empty by default
    #column_mappings: CARD_NO=sfz    #Column name mapping. Multiple mapping relationships are separated by commas, such as USER_NAME=account indicates that the field name USER_NAME is mapped to account
    #default_column_values: source=binlog,area_name=Hefei  #The default column value. Multiple columns are separated by commas, such as source=binlog,area_name = Hefei
    #date_formatter: yyyy-MM-dd #The date type is formatted, and the default yyyy MM DD is not filled in
    #datetime_formatter: yyyy-MM-dd HH:mm:ss #datetime and timestamp types are formatted, and the default yyyy MM DD HH: mm: SS is not filled in
    value_encoder: json  #Value encoding, supporting json, kV Commons and v-commons; The default is json
    #value_ Formatter: '{. ID}} | {. User_name} | {{. Real_name} | {{if Eq. status 0}} disable {{else}} enable {{end}}'
    
    #rocketmq correlation
    rocketmq_topic: transfer_test_topic #rocketmq topic, which can be blank. The table name is used by default
    #reserve_raw_data: false #Keep the data before update, which is useful for rocketmq, kafka and rabbitmq; The default is false

Among them,
value_encoder refers to the value encoding format. It supports json, kV Commons and v-commons formats. If it is not filled in, it defaults to json, as shown in the following table:

formatexplaingive an example
jsonjson{"id": "1001","userName": "admin","password": "123456", "createTime": "2020-07-20 14:29:19"}
kv-commasKey value comma separatedid=1001,userName=admin,password=123456,createTime=2020-07-20 14:29:19
v-commasvalue comma separated1001,admin,123456,2020-07-20 14:29:19

value_formatter represents the formatted expression of the value. For the specific template syntax, see the chapter "expression template"_ Value when formatter is not null_ Invalid encoder.

reserve_raw_data indicates whether to keep the data before update, that is, keep the old data before modification. It is not retained by default

Example

t_user table, the data are as follows:

Create a file named transfer in RocketMQ_ test_ Topic of topic. Note that the name of topic must be the same as RocketMQ in rule rules_ The value of topic configuration item is consistent

Example 1

Use the above configuration

insert event. The data synchronized to RocketMQ is as follows:

update event. The data synchronized to RocketMQ is as follows:

reserve_ raw_ When data is set to true, the update event and the data synchronized to RocketMQ are as follows:

Where the raw attribute is the old data before update
In the delete event, the data synchronized to RocketMQ is as follows:

Example 2

The configuration is as follows:

rule:
  -
    schema: eseap #Database name
    table: t_user #Table name
    #order_by_column: id #Sorting field, which cannot be empty during stock data synchronization
    column_lower_case: true #The column name is converted to lowercase, and the default is false
    #column_upper_case:false#The column name is capitalized, and the default is false
    #column_underscore_to_camel: true #The column name is underlined to hump, and the default is false
    # Include columns with multiple values separated by commas, such as ID, name, age and area_ When the ID is empty, all columns are included
    #include_columns: ID,USER_NAME,PASSWORD
    #exclude_columns: BIRTHDAY,MOBIE # The excluded columns are separated by multi value commas, such as ID, name, age and area_ ID is empty by default
    column_mappings: USER_NAME=account    #Column name mapping. Multiple mapping relationships are separated by commas, such as USER_NAME=account indicates that the field name USER_NAME is mapped to account
    default_column_values: area_name=Hefei  #The default column value. Multiple columns are separated by commas, such as source=binlog,area_name = Hefei
    #date_formatter: yyyy-MM-dd #The date type is formatted, and the default yyyy MM DD is not filled in
    #datetime_formatter: yyyy-MM-dd HH:mm:ss #datetime and timestamp types are formatted, and the default yyyy MM DD HH: mm: SS is not filled in
    value_encoder: json  #Value encoding, supporting json, kV Commons and v-commons; The default is json
    #value_ Formatter: '{. ID}} | {. User_name} | {{. Real_name} | {{if Eq. status 0}} disable {{else}} enable {{end}}'

    #rocketmq correlation
    rocketmq_topic: transfer_test_topic #rocketmq topic, which can be blank. The table name is used by default
    #reserve_raw_data: false #Keep the data before update, which is useful for rocketmq, kafka and rabbitmq; The default is false

Among them,
column_mappings means remapping column names

insert event. The data synchronized to RocketMQ is as follows:

Where the attribute name is user_ Change name to account

Example 3

The configuration is as follows:

rule:
  -
    schema: eseap #Database name
    table: t_user #Table name
    #order_by_column: id #Sorting field, which cannot be empty during stock data synchronization
    column_lower_case: true #The column name is converted to lowercase, and the default is false
    #column_upper_case:false#The column name is capitalized, and the default is false
    #column_underscore_to_camel: true #The column name is underlined to hump, and the default is false
    # Include columns with multiple values separated by commas, such as ID, name, age and area_ When the ID is empty, all columns are included
    #include_columns: ID,USER_NAME,PASSWORD
    #exclude_columns: BIRTHDAY,MOBIE # The excluded columns are separated by multi value commas, such as ID, name, age and area_ ID is empty by default
    #column_mappings: USER_NAME=account    #Column name mapping. Multiple mapping relationships are separated by commas, such as USER_NAME=account indicates that the field name USER_NAME is mapped to account
    default_column_values: area_name=Hefei  #The default column value. Multiple columns are separated by commas, such as source=binlog,area_name = Hefei
    #date_formatter: yyyy-MM-dd #The date type is formatted, and the default yyyy MM DD is not filled in
    #datetime_formatter: yyyy-MM-dd HH:mm:ss #datetime and timestamp types are formatted, and the default yyyy MM DD HH: mm: SS is not filled in
    value_encoder: v-commas  #Value encoding, supporting json, kV Commons and v-commons; The default is json
    #value_formatter: '{{.ID}}|{{.USER_NAME}}' # Value formatting expressions, such as: {. ID} | {. USER_NAME}}, {. ID}} represents the value of the ID field, and {. USER_NAME}} represents user_ The value of the name field

    #rocketmq correlation
    rocketmq_topic: transfer_test_topic #rocketmq topic, which can be blank. The table name is used by default

Among them,
value_encoder indicates the message encoding method

insert event. The data synchronized to RocketMQ is as follows:

Synchronize data to Redis

Synchronize data to MongoDB

Synchronize data to RocketMQ

Synchronize data to Kafka

Synchronize data to RabbitMQ

Full data import

Full data import

Lua script

Lua is a lightweight and compact scripting language, which is designed to be embedded in applications, so as to provide flexible expansion and customization functions for applications. Developers only need to spend a little time to roughly master Lua's syntax and write usable scripts according to tiger painting cat.

Based on the high scalability of Lua, more complex data analysis and message generation logic can be realized, and the required data format can be customized.

performance

Performance test address

summary

  • Go MySQL elasticsearch to achieve incremental full data synchronization
  • Go MySQL elastic search can synchronize insert, update and delete operations
  • The stability of go MySQL elasticsearch is poor. It has failed to synchronize successfully. There is no detailed log, which is not convenient for troubleshooting

common problem

How to reset the synchronization position

1. Stop the go MySQL transfer application
2. Execute the show master status statement in the database, and you will see the following results:

FilePositionBinlog-Do-DBBinlog-Ignore-DB
mysql-bin.000025993779648

3. Use the values of the File and Position columns
Execute the command:. / go MySQL transfer - config app.yml - position MySQL bin.000025 993779648

4. Restart the application:. / go MySQL transfer - config app.yml

How to synchronize multiple tables

Array syntax using yml:

#A set of lines beginning with a conjunction line to form an array
animal: 
  - Cat
  - Dog
  - Goldfish

Go MySQL transfer supports single database and multiple tables, as well as multiple databases and multiple tables. The configuration is as follows:

rule:
  -
    schema: eseap #Database name
    table: t_user #Table name
    column_underscore_to_camel: true 
    value_encoder: json  
    redis_structure: string 
    redis_key_prefix: USER_
  -
    schema: eseap #Database name
    table: t_sign #Table name
    column_underscore_to_camel: true 
    value_encoder: json  
    redis_structure: string 
    redis_key_prefix: SIGN_
  -
    schema: gojob #Database name
    table: t_triggered #Table name
    column_underscore_to_camel: true 
    value_encoder: json  
    redis_structure: string 
    redis_key_prefix: TRIGGERED_

t_user table and t_ The sign table belongs to the eseap database, t_ The triggered table belongs to the gojob database

reference resources:

Official Manual: https://www.kancloud.cn/wj596...

https://blog.csdn.net/weixin_...

Tags: Java MySQL canal

Posted on Tue, 07 Dec 2021 00:34:38 -0500 by Ima2003