SQL function of ClickHouse and Update and Delete operations

1.SQL function

1.1 type detection function

select toTypeName(0);  -- UInt8
select toTypeName(-0);  -- UInt8
select toTypeName(1000);  -- UInt16
select toTypeName(-1000);  -- Int16
select toTypeName(10000000);  -- UInt32
select toTypeName(-10000000);  -- Int32
select toTypeName(1.99);  -- Float64
select toTypeName(toFloat32(1.99));  -- Float32
select toTypeName(toDate('2019-12-12')) as dateType, toTypeName(toDateTime('2019-12-12 12:12:12')) as dateTimeType;  -- Date | DateTime
select toTypeName([1,3,5]);  -- Array(UInt8)

1.2 mathematical functions

Function nameeffectusageresult
plusSumselect plus(1,1)=2
minusdifferenceselect ninus(10,5)=5
multiplyquadrature select multiply(2, 2)=4
dividedivisionselect divide(6, 2)
select divide(10, 0)
select divide(0, 0)
intDivInteger divisionselect intDiv(10, 3)=3
intDivOrZeroCalculation quotientselect intDivOrZero(5,2)=2
moduloremainderselect modulo(10, 3)=1
negateReverseselect negate(10)=-10
absabsolute valueselect abs(-10)=10
gcdgreatest common divisorselect gcd(12, 24)=12
lcmLeast common multipleselect lcm(12, 24)=24

1.3 time function

select now() as curDT,toYYYYMM(curDT),toYYYYMMDD(curDT),toYYYYMMDDhhmmss(curDT);
select toDateTime('2019-12-16 14:27:30') as curDT;
select toDate('2019-12-12') as curDT;

2. Use of update / delete

In terms of usage scenarios, Clickhouse is an analytical database. In this scenario, the data is generally unchanged, so Clickhouse's support for update and delete is relatively weak. In fact, it does not support standard update and delete operations.
The following describes the use of update and delete in Clickhouse.

Update and delete syntax:

Clickhouse updates and deletes through alter. It calls update and delete operations mutation. The syntax is:

ALTER TABLE [db.]table DELETE WHERE filter_expr
ALTER TABLE [db.]table UPDATE column1 = expr1 [, ...] WHERE filter_expr

So, what is the difference between mutation and standard update and delete?
The update and delete operations of standard SQL are synchronous, that is, the client needs to wait for the server to return the execution result (usually the int value);
The Clickhouse's update and delete are implemented asynchronously. When the update statement is executed, the server will return immediately, but in fact, the data has not changed at this time, but wait in line.

  • Create table:

    CREATE TABLE tbl_test_users(
        id UInt64, 
        email String, 
        username String, 
        gender UInt8, 
        birthday Date, 
        mobile FixedString(13), 
        pwd String, 
        regDT DateTime, 
        lastLoginDT DateTime, 
        lastLoginIP String
    ) ENGINE=MergeTree() partition by toYYYYMMDD(regDT) order by id settings index_granularity=8192;
  • Insert data into the table of the MergeTree engine

    insert into tbl_test_users(id, email, username, gender, birthday, mobile, pwd, regDT, lastLoginDT, lastLoginIP) values (1,'wcfr817e@yeah.net','nameless',2,'1992-05-31','13306834911','7f930f90eb6604e837db06908cc95149','2008-08-06 11:48:12','2015-05-08 10:51:41',''),(2,'xuwcbev9y@ask.com','Li Si',1,'1983-10-11','15302753472','7f930f90eb6604e837db06908cc95149','2008-08-10 05:37:32','2014-07-28 23:43:04',''),(3,'mgaqfew@126.com','Wang Wu',1,'1970-11-22','15200570030','96802a851b4a7295fb09122b9aa79c18','2008-08-10 11:37:55','2014-07-22 23:45:47',''),(4,'b7zthcdg@163.net','Zhao Liu',1,'2002-02-10','15207308903','96802a851b4a7295fb09122b9aa79c18','2008-08-10 14:47:09','2013-12-26 15:55:02',''),(5,'ezrvy0p@163.net','pseudo-ginseng',1,'1987-09-01','13005861359','96802a851b4a7295fb09122b9aa79c18','2008-08-12 21:58:11','2013-12-26 15:52:33','');
  • Update data

    ALTER TABLE tbl_test_users UPDATE username='Zhang San' WHERE id=1;
    select * from tbl_test_users;
  • Delete data

    ALTER TABLE tbl_test_users DELETE WHERE id=1;
    select * from tbl_test_users;
  • Viewing the mutation queue

    How to check whether the data has been updated?
    You can view relevant information through the system.mutations table:

    FROM system.mutations
    ORDER BY create_time DESC
        LIMIT 10;
    database: Library name
    table: Table name
    command: to update/Delete statement
    create_time: mutation Task creation time. The system processes data changes in this chronological order
    is_done: Is it complete? 1 is complete, 0 is incomplete

    Through the above information, you can view which changes have been completed, is_ When done is 1, it means that it has been completed.

  • Specific process of Mutation

    First, use the where condition to find the partition to be modified;
    Then, rebuild each partition and replace the old partition with a new partition. Once the partition is replaced, it cannot be rolled back;

    For each partition, it can be considered atomic; However, for the entire mutation, if multiple partitions are involved, it is not atomic.

  • matters needing attention

    The update feature does not support updating columns about primary keys or partition keys
    The update operation is not atomic, that is, during the update process, some of the select results are likely to change and some remain unchanged. You can know from the specific process above
    Updates are performed in the order of submission
    Once the update is submitted, it cannot be revoked. Even if the clickhouse service is restarted, it will continue to execute in the order of system.mutations
    Entries that have been updated will not be deleted immediately, and the number of reserved entries is determined by finished_mutations_to_keep storage engine parameter determination. When the amount of data is exceeded, the old entries are deleted
    Updates may get stuck. For example, if the update intvalue='abc 'type of incorrect UPDATE statement fails to execute, it will remain stuck here. At this time, you can use kill music to cancel.


    kill mutation where database='app' and table='test' -- database,table yes system.mutations Fields in table
  • Use suggestions

    According to the official instructions, the use scenario of update/delete is to update a large amount of data at one time, that is, the result of where condition filtering should be a large amount of data.
    For example: alter table test update status = 1, where status = 0 and day ='2020-04-01 ', update the data for one day at a time.
    So, can you update only one piece of data at a time? For example: alter table test update pv=110 where id=100
    Of course, you can, but frequent such operations may cause pressure on the service. This is easy to understand. As mentioned above, the updated unit is the partition. If only one piece of data is updated, a partition needs to be rebuilt; If 100 pieces of data are updated and they may fall on 3 partitions, 3 partitions need to be rebuilt; Relatively speaking, the overall efficiency of updating a batch of data at a time is much higher than updating a row at a time.
    For the scenario of frequent single update, it is recommended to use the replacing mergetree / collapsing mergetree engine to solve it in disguise.

Tags: Big Data SQL clickhouse

Posted on Fri, 08 Oct 2021 23:21:43 -0400 by mattcairns