OLAP query engine, ClickHouse cluster management

1, Introduction to column Library

ClickHouse is an open-source column storage database (DBMS) of Yandex company in Russia in 2016, which is mainly used for OLAP online analysis and processing queries, and can generate analysis data reports in real time using SQL queries.

Columnar storage

For row storage and column storage, the organization structure of data on disk is fundamentally different. For data analysis and calculation, row storage needs to traverse the whole table, while column storage only needs to traverse a single column, so column library is more suitable for making large and wide tables for data analysis and calculation.

Let's go on: note that the scenario compared here is the scenario of data analysis and calculation.

2, Cluster configuration

1. Basic environment

ClickHouse single service has been installed by default

2. Remove file restrictions

vim /etc/security/limits.confvim /etc/security/limits.d/90-nproc.conf Append at the end of the file* soft nofile 65536 * hard nofile 65536 * soft nproc 131072 * hard nproc 131072

3. Cancel SELINUX

Restart after modifying SELINUX=disabled in / etc/selinux/config

4. Cluster profile

Add cluster configuration for services: vim /etc/metrika.xml

<yandex><clickhouse_remote_servers> <clickhouse_cluster> <shard> <internal_replication>true</internal_replication> <replica> <host>192.168.72.133</host> <port>9000</port> </replica> </shard> <shard> <replica> <internal_replication>true</internal_replication> <host>192.168.72.136</host> <port>9000</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <replica> <host>192.168.72.137</host> <port>9000</port> </replica> </shard> </clickhouse_cluster></clickhouse_remote_servers><zookeeper-servers> <node index="1"> <host>192.168.72.133</host> <port>2181</port> </node> <node index="2"> <host>192.168.72.136</host> <port>2181</port> </node> <node index="3"> <host>192.168.72.137</host> <port>2181</port> </node></zookeeper-servers><macros> <replica>192.168.72.133</replica></macros><networks> <ip>::/0</ip></networks><clickhouse_compression><case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method></case></clickhouse_compression></yandex>

Pay attention here

<macros> <replica>192.168.72.133</replica></macros>

Configure the IP address of each service.

5. Start cluster

Start three services respectively

service clickhouse-server start

6. Log in to the client to view

Just log in to any service here

clickhouse-clienten-master :) select * from system.clusters

Here is the cluster name: clickhouse_cluster, for later use.

7. Basic environment test

Create table structure on three services at the same time.

CREATE TABLE ontime_local (FlightDate Date,Year UInt16) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);

133 environment creation distribution table

CREATE TABLE ontime_all AS ontime_local ENGINE = Distributed(clickhouse_cluster, default, ontime_local, rand());

Write any service data

insert into ontime_local (FlightDate,Year) values ('2020-03-12',2020);

Query summary

select * from ontime_all;

Write to the master table, and the data will be distributed to each single table

insert into ontime_all (FlightDate,Year)values('2001-10-12',2001);insert into ontime_all (FlightDate,Year)values('2002-10-12',2002);insert into ontime_all (FlightDate,Year)values('2003-10-12',2003);

Shut down a service at will, and the cluster query directly hangs up

3, Cluster environment integration

1. Basic configuration

url: configure all service lists, mainly used to manage table structure and batch processing;

Cluster: cluster connection service, which can be configured based on Nginx agent service;

spring: datasource: type: com.alibaba.druid.pool.DruidDataSource click: driverClassName: ru.yandex.clickhouse.ClickHouseDriver url: jdbc:clickhouse://127.0.0.1:8123/default,jdbc:clickhouse://127.0.0.1:8123/default,jdbc:clickhouse://127.0.0.1:8123/default cluster: jdbc:clickhouse://127.0.0.1:8123/default initialSize: 10 maxActive: 100 minIdle: 10 maxWait: 6000

2. Management interface

Create tables and write data to each single node service respectively: which is the best Department of gynecology hospital in Zhengzhou http://jbk.39.net/yiyuanzaixian/sysdfkyy/

data_ Shard (single node data)

data_ All (distribution data)

@RestControllerpublic class DataShardWeb { @Resource private JdbcFactory jdbcFactory ; /** * Infrastructure creation */ @GetMapping("/createTable") public String createTable (){ List<JdbcTemplate> jdbcTemplateList = jdbcFactory.getJdbcList(); for (JdbcTemplate jdbcTemplate:jdbcTemplateList){ jdbcTemplate.execute("CREATE TABLE data_shard (FlightDate Date,Year UInt16) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192)"); jdbcTemplate.execute("CREATE TABLE data_all AS data_shard ENGINE = Distributed(clickhouse_cluster, default, data_shard, rand())"); } return "success" ; } /** * Node table write data */ @GetMapping("/insertData") public String insertData (){ List<JdbcTemplate> jdbcTemplateList = jdbcFactory.getJdbcList(); for (JdbcTemplate jdbcTemplate:jdbcTemplateList){ jdbcTemplate.execute("insert into data_shard (FlightDate,Year) values ('2020-04-12',2020)"); } return "success" ; } }

3. Cluster query

After the above steps are completed, you can connect the cluster service to query the data of the distribution master table and the single table. Zhengzhou artificial insemination cost: http://jbk.39.net/yiyuanfengcai/tsyl_zztjyy/3104/

Druid based connection

@Configurationpublic class DruidConfig { @Resource private JdbcParamConfig jdbcParamConfig ; @Bean public DataSource dataSource() { DruidDataSource datasource = new DruidDataSource(); datasource.setUrl(jdbcParamConfig.getCluster()); datasource.setDriverClassName(jdbcParamConfig.getDriverClassName()); datasource.setInitialSize(jdbcParamConfig.getInitialSize()); datasource.setMinIdle(jdbcParamConfig.getMinIdle()); datasource.setMaxActive(jdbcParamConfig.getMaxActive()); datasource.setMaxWait(jdbcParamConfig.getMaxWait()); return datasource; } }

mapper based query

<mapper namespace="com.ckhouse.cluster.mapper.DataAllMapper"> <resultMap id="BaseResultMap" type="com.ckhouse.cluster.entity.DataAllEntity"> <result column="FlightDate" jdbcType="VARCHAR" property="flightDate" /> <result column="Year" jdbcType="INTEGER" property="year" /> </resultMap> <select id="getList" resultMap="BaseResultMap" > select * from data_all where Year=2020 </select></mapper>

4, Source code address GitHub · address https://github.com/cicadasmile/data-manage-parentGitEE ·Address https://gitee.com/cicadasmile/data-manage-parent