ClickHouse is an open-source column storage database (DBMS) of Yandex company in Russia in 2016, which is mainly used for OLAP online analysis and processing queries, and can generate analysis data reports in real time using SQL queries.
Columnar storage
For row storage and column storage, the organization structure of data on disk is fundamentally different. For data analysis and calculation, row storage needs to traverse the whole table, while column storage only needs to traverse a single column, so column library is more suitable for making large and wide tables for data analysis and calculation.
Let's go on: note that the scenario compared here is the scenario of data analysis and calculation.
2, Cluster configuration1. Basic environment
ClickHouse single service has been installed by default
2. Remove file restrictions
vim /etc/security/limits.confvim /etc/security/limits.d/90-nproc.conf Append at the end of the file* soft nofile 65536 * hard nofile 65536 * soft nproc 131072 * hard nproc 131072
3. Cancel SELINUX
Restart after modifying SELINUX=disabled in / etc/selinux/config
4. Cluster profile
Add cluster configuration for services: vim /etc/metrika.xml
<yandex><clickhouse_remote_servers> <clickhouse_cluster> <shard> <internal_replication>true</internal_replication> <replica> <host>192.168.72.133</host> <port>9000</port> </replica> </shard> <shard> <replica> <internal_replication>true</internal_replication> <host>192.168.72.136</host> <port>9000</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <replica> <host>192.168.72.137</host> <port>9000</port> </replica> </shard> </clickhouse_cluster></clickhouse_remote_servers><zookeeper-servers> <node index="1"> <host>192.168.72.133</host> <port>2181</port> </node> <node index="2"> <host>192.168.72.136</host> <port>2181</port> </node> <node index="3"> <host>192.168.72.137</host> <port>2181</port> </node></zookeeper-servers><macros> <replica>192.168.72.133</replica></macros><networks> <ip>::/0</ip></networks><clickhouse_compression><case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method></case></clickhouse_compression></yandex>
Pay attention here
<macros> <replica>192.168.72.133</replica></macros>
Configure the IP address of each service.
5. Start cluster
Start three services respectively
service clickhouse-server start
6. Log in to the client to view
Just log in to any service here
clickhouse-clienten-master :) select * from system.clusters
Here is the cluster name: clickhouse_cluster, for later use.
7. Basic environment test
Create table structure on three services at the same time.
CREATE TABLE ontime_local (FlightDate Date,Year UInt16) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);
133 environment creation distribution table
CREATE TABLE ontime_all AS ontime_local ENGINE = Distributed(clickhouse_cluster, default, ontime_local, rand());
Write any service data
insert into ontime_local (FlightDate,Year) values ('2020-03-12',2020);
Query summary
select * from ontime_all;
Write to the master table, and the data will be distributed to each single table
insert into ontime_all (FlightDate,Year)values('2001-10-12',2001);insert into ontime_all (FlightDate,Year)values('2002-10-12',2002);insert into ontime_all (FlightDate,Year)values('2003-10-12',2003);
Shut down a service at will, and the cluster query directly hangs up
3, Cluster environment integration1. Basic configuration
url: configure all service lists, mainly used to manage table structure and batch processing;
Cluster: cluster connection service, which can be configured based on Nginx agent service;
spring: datasource: type: com.alibaba.druid.pool.DruidDataSource click: driverClassName: ru.yandex.clickhouse.ClickHouseDriver url: jdbc:clickhouse://127.0.0.1:8123/default,jdbc:clickhouse://127.0.0.1:8123/default,jdbc:clickhouse://127.0.0.1:8123/default cluster: jdbc:clickhouse://127.0.0.1:8123/default initialSize: 10 maxActive: 100 minIdle: 10 maxWait: 6000
2. Management interface
Create tables and write data to each single node service respectively: which is the best Department of gynecology hospital in Zhengzhou http://jbk.39.net/yiyuanzaixian/sysdfkyy/
data_ Shard (single node data)
data_ All (distribution data)
@RestControllerpublic class DataShardWeb { @Resource private JdbcFactory jdbcFactory ; /** * Infrastructure creation */ @GetMapping("/createTable") public String createTable (){ List<JdbcTemplate> jdbcTemplateList = jdbcFactory.getJdbcList(); for (JdbcTemplate jdbcTemplate:jdbcTemplateList){ jdbcTemplate.execute("CREATE TABLE data_shard (FlightDate Date,Year UInt16) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192)"); jdbcTemplate.execute("CREATE TABLE data_all AS data_shard ENGINE = Distributed(clickhouse_cluster, default, data_shard, rand())"); } return "success" ; } /** * Node table write data */ @GetMapping("/insertData") public String insertData (){ List<JdbcTemplate> jdbcTemplateList = jdbcFactory.getJdbcList(); for (JdbcTemplate jdbcTemplate:jdbcTemplateList){ jdbcTemplate.execute("insert into data_shard (FlightDate,Year) values ('2020-04-12',2020)"); } return "success" ; } }
3. Cluster query
After the above steps are completed, you can connect the cluster service to query the data of the distribution master table and the single table. Zhengzhou artificial insemination cost: http://jbk.39.net/yiyuanfengcai/tsyl_zztjyy/3104/
Druid based connection
@Configurationpublic class DruidConfig { @Resource private JdbcParamConfig jdbcParamConfig ; @Bean public DataSource dataSource() { DruidDataSource datasource = new DruidDataSource(); datasource.setUrl(jdbcParamConfig.getCluster()); datasource.setDriverClassName(jdbcParamConfig.getDriverClassName()); datasource.setInitialSize(jdbcParamConfig.getInitialSize()); datasource.setMinIdle(jdbcParamConfig.getMinIdle()); datasource.setMaxActive(jdbcParamConfig.getMaxActive()); datasource.setMaxWait(jdbcParamConfig.getMaxWait()); return datasource; } }
mapper based query
<mapper namespace="com.ckhouse.cluster.mapper.DataAllMapper"> <resultMap id="BaseResultMap" type="com.ckhouse.cluster.entity.DataAllEntity"> <result column="FlightDate" jdbcType="VARCHAR" property="flightDate" /> <result column="Year" jdbcType="INTEGER" property="year" /> </resultMap> <select id="getList" resultMap="BaseResultMap" > select * from data_all where Year=2020 </select></mapper>4, Source code address GitHub · address https://github.com/cicadasmile/data-manage-parentGitEE ·Address https://gitee.com/cicadasmile/data-manage-parent