[technical grass planting] I built a complete set of big data system with the money of one rougamo

How can a wool party not participate in the promotion of the Eleventh National Congress of the Communist Party of China. Then I plan to come to Tencent cloud to collect wool.

Let me share how to use the money of a rougamo to build a big data platform on the cloud. After my repeated study, I found that it was too simple to collect wool. Finally, I bought MySQL for 19.9 yuan and stream computing Oceanus (Flink) for 1 yuan. I spent more than 20 yuan to build this style of big data system.

Architecture diagram:

Here are the specific steps for me to collect wool:

1. Find Tencent cloud Double 11 event Homepage.

2. Purchase MySQL.

I found MySQL in the second kill activity. 19 yuan and 9 yuan can be used for one year.

Try to buy here and find that you need to select the available area. When selecting an available zone, MySQL, Flink (stream computing Oceanus) cluster and ES cluster need to select the same available zone.

3. Purchase Flink cluster.

Tencent cloud's Flink platform is called stream computing Oceanus. I found more than 1000 on the home page of the double 11 event. Fortunately, I kept an eye on it Stream computing Oceanus product home page If you find a new user, you can buy the cluster for 1 yuan.

Then I bought the cluster and found that I had to spend 1 yuan to buy the COS storage cluster during the order placing process, but it was not expensive, so I bought it by the way.

4. Purchase Elasticsearch cluster

Tencent cloud's ES found that there are not enough discounts here on the double 11, and it is mainly enterprise users. Then go too ES product home page After browsing, I found that new users can buy it for 0 yuan! It's free. Let's do it.

5. System construction.

In this way, I bought a complete set of big data components with rougamo money (19.9 yuan for MySQL + 1 yuan for Flink cluster + 1 yuan for COS cluster + 0 yuan for ES cluster).

Pre preparation

1. MySQL Cluster preparation

1.1 create a new MySQL Cluster

Enter MySQL console [1] and set the parameter binlog in Database Management > parameter settings_ row_ Image = full, easy to use CDC (Capture Data Change) feature to realize real-time capture of data changes.

1.2 data preparation

First, create the testdb library, create the user table in the testdb library, and insert data.

user table structure:

Field name

type

meaning

user_id

int

User ID

user_name

varchar(50)

user name

create_time

timestamp

Creation time

Insert 2 pieces of data into the table.

INSERT INTO `user` (`user_id`, `user_name`, `create_time`) VALUES (1001, 'Xiao Ming', '2021-10-01 00:00:00'); INSERT INTO `user` (`user_id`, `user_name`, `create_time`) VALUES (1002, 'TONY', '2021-10-02 00:00:00');

1.3 setting parameters

Click the instance ID, click [database management] on the instance details page to enter the [parameter setting] panel and set binlog_row_image=FULL to enable synchronization of database changes.

Integrate data into the Oceanus (Flink) cluster through mysql. You can use either the Flink connector JDBC or the Flink connector mysq CDC. When using the MySQL CDC feature, the link connector mysq CDC connector needs to set the parameter binlog of the MySQL database_ row_ image=FULL.

Create Flink job

1. Create Source

CREATE TABLE `user_source` ( `user_id` int, `user_name` varchar(50), PRIMARY KEY (`user_id`) NOT ENFORCED -- If the database table to be synchronized has a primary key defined, You also need to define it here ) WITH ( 'connector' = 'mysql-cdc', -- Must be 'mysql-cdc' 'hostname' = '10.0.0.158', -- Database IP 'port' = '3306', -- Access port of the database 'username' = 'root', -- User name for database access (required) SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT, SELECT, RELOAD Permissions) 'password' = 'yourpassword', -- Password for database access 'database-name' = 'testdb', -- Databases that need to be synchronized 'table-name' = 'user' -- Name of data table to be synchronized );

2. Create Sink

-- Elasticsearch Can only be used as a data destination table( Sink)write in -- See https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html#elasticsearch-connector CREATE TABLE es_sink ( `user_id` INT, `user_name` VARCHAR ) WITH ( 'connector.type' = 'elasticsearch', -- Output to Elasticsearch 'connector.version' = '6', -- appoint Elasticsearch Version of, for example '6', '7'. 'connector.hosts' = 'http://10.0.0.175:9200', 'connector.index' = 'User', 'connector.document-type' = 'user', 'connector.username' = 'elastic', 'connector.password' = 'yourpassword', 'update-mode' = 'upsert', -- To capture database changes, use 'upsert' pattern 'connector.key-delimiter' = '$', -- Optional parameters, Join character of composite primary key (The default is _ Symbol) 'connector.key-null-literal' = 'n/a', -- Primary key is null The default value is 'null' 'connector.connection-max-retry-timeout' = '300', -- Maximum timeout per request (ms) 'format.type' = 'json' -- Output data format, Currently only supported 'json' );

3. Write business SQL

insert into es_sink ( select user_id, LOWER(user_name) -- LOWER()The function converts the user name to lowercase from user_source );

4. Select Connector

Click Save > publish draft to run the operation.

Please select the corresponding Connector according to the actual purchased Elasticsearch version. After version 1.13, you do not need to select the Connector that can automatically match.

5. Data query

Enter Elasticsearch console [5], click the Elasticsearch instance purchased before, and click [Kibana] in the upper right corner to enter Kibana to query data. For specific query methods, please refer to accessing clusters through Kibana [7].

summary

This big data system uses MySQL connector to continuously integrate database data change records. Oceanus realizes the most basic data conversion function through flow calculation, and finally Sink to Elasticsearch. The money spent on rougamo is too straight, yyds!

Reference reading

1: MySQL console: https://console.cloud.tencent.com/cdb

2: Create mysql instance: https://cloud.tencent.com/document/product/236/46433

3: Flow calculation Oceanus console: https://console.cloud.tencent.com/oceanus/overview

4: Create Oceanus exclusive cluster: https://cloud.tencent.com/document/product/849/48298

5: Elasticsearch console: https://console.cloud.tencent.com/es

6: To create an Elasticsearch cluster: https://cloud.tencent.com/document/product/845/19536

7: Access the cluster through Kibana: https://cloud.tencent.com/document/product/845/19541