The company's main business is to provide SMS mass distribution services to enterprise customers, with a strong internal operation monitoring platform.The daily SMS business is huge, so the monitoring platform needs to use time series database to statistics and monitor the massive SMS sent by geographic regions, operators, reception status and other categories.InfluxDB has always been used before, and queries that span a slightly larger time span, such as one month's data, appear to be very slow.Since TDengine was open source, you know this bull and try TDengine.
After understanding the basic functions, I went online to the TDengine version of the monitoring system, but found that there is no "group by" in grafana, you can only write where multiple statements to put multiple SMS status data in one dashboard, as shown in the following figure:
If there are more where conditions, this is too clumsy and less flexible.
So, I began to study the official documents carefully, understand the "super table", "continuous query" and so on. During this process, I encountered many problems, and made a record here (test environment, data is generated by simulation).
I. Installation and operation
* Test environment
cat /etc/redhat-release CentOS Linux release 7.7.1908 (Core)
The installation is simple:
rpm -ivh tdengine-1.6.3.1-3.x86_64.rpm
Configuration is default (no profile changes)
Start TDengine:
systemctl start taosd
2. Building a Library and a Table
Enter "taos" on the command line
taos>
The following libraries and tables are built at this prompt
Create a database
create database jk keep 365 precision 'us';
Description:
-
keep 365 means that the database holds data for 365 days and data before 365 days is automatically cleared (because it is a time series database, delete operations cannot be performed on tables in the library);
Precision'us'indicates that the timestamp precision in the database is "microseconds" (milliseconds by default, or precision'ms'), which has been tested to be acceptable in both single and double quotes, but not without quotes.
Configuration of time precision is no longer supported in the configuration file and must be specified when the library is built (many of the configurations tested are not valid, official replies after Issues)
TDengine was originally designed for the Internet of Things. The accuracy of the device's information collection to "milliseconds" is sufficient, but our SMS platform will generate a sudden large amount of data. To avoid possible data loss, we set the accuracy to "microseconds". After testing, the result is good. The test analog data insertion timestamp is obtained with "now". The left side of the image below is "milliseconds".Second precision, you can see "0 row(s)" occurs, indicating that there is no data inserted, right side is "microsecond" precision, no data inserted.
2. Create Super Table
Access database "jk"
taos> use jk; Database changed.
create table jiankong (ts timestamp, gatewayid binary(6), companyid binary(20), provinceid binary(10), cityid binary(10), value int, timestr binary(30)) tags(type binary(10), subtype binary(10));
There are three types and hundreds of subtype s in the SMS system, so keep these static information (or simply understand that you need to set the field for group by to tag)
Explain the super table:
STable is an abstraction of data collection points of the same type, a collection of collection instances of the same type, and contains multiple subtables with the same data structure.
Each STable defines a table structure and a set of labels for its subtables: the table structure is the data column and its data type recorded in the table; the label name and data type are defined by the STable, and the label value records the static information of each subtable for grouping and filtering the subtables.
Subtables are essentially ordinary tables, consisting of a timestamp primary key and several data columns, each row records specific data, and data queries operate exactly like regular tables; however, the difference between subtables and regular tables is that each subtable belongs to a super table and has a set of label values defined by STable s.
Multistable aggregated queries against all sub-tables created by STable s support conditional filtering based on all TAG values (where) and aggregation based on values in TAGS (group by). Fuzzy match filtering for binary type is not supported at this time.
Label data (or label values) are directly related to each subtable, and the same label values (one or more, up to six) are located in one subtable (the same label values can be mapped to multiple subtables by creating tables automatically when writing data).
tag values support Chinese and need to be set to NCHAR (as verified in other tests).
3. Create subtables (subtables are regular tables, and the table structure is completely defined by supertables):
create table jiankong_sub_send using jiankong tags ('send', 'send'); create table jiankong_sub_delivrd using jiankong tags ('delivrd', 'delivrd'); create table jiankong_sub_undeliv_db_0108 using jiankong tags ('undeliv', 'DB:0108'); create table jiankong_sub_undeliv_db_0107 using jiankong tags ('undeliv', 'DB:0107'); create table jiankong_sub_undeliv_balance using jiankong tags ('undeliv', 'BALANCE'); create table jiankong_sub_undeliv_id_0076 using jiankong tags ('undeliv', 'ID:0076'); create table jiankong_sub_undeliv_ib_0008 using jiankong tags ('undeliv', 'IB:0008');
Combination of types and subtypes of the same type is created as a subtable (test only, so not all hundreds of subtypes are built)
4. Insert data
INSERT INTO jiankong_sub_send VALUES (now, 3034, '1564', '109', '1272', '2', '201909231530') INSERT INTO jiankong_sub_delivrd VALUES (now, 3034, '1564', '109', '1272', '2', '201909231530') INSERT INTO jiankong_sub_undeliv_balance VALUES (now, 1179, '152', '106', '1000', '1', '201910071113') INSERT INTO jiankong_sub_undeliv_id_0076 VALUES (now, 1165, '1785', '111', '1226', '1', '201910071415') INSERT INTO jiankong_sub_undeliv_ib_0008 VALUES (now, 1165, '1785', '127', '1000', '2', '201910061727') INSERT INTO jiankong_sub_undeliv_db_0108 VALUES (now, 90, '548', '123', '1237', '1', '201910061127') INSERT INTO jiankong_sub_undeliv_db_0107 VALUES (now, 2261, '808', '116', '1314', '2', '201910032106')
The above is to insert simulated data into each of the seven subtables created above. Because a large amount of data is simulated, you need to write a shell script (or otherwise) to insert data.
You cannot write data directly to STable s, but to each subtable.
5. Queries on database, table structure, etc.
Query database information:
taos> show databases; name | created time | ntables | vgroups |replica| days | keep1,keep2,keep(D) | tables | rows | cache(b) | ablocks |tblocks| ctime(s) | clog | comp |time precision| status | ============================================================================================================================================================================================================================================== log | 19-11-18 16:37:14.025| 4| 1| 1| 10|30,30,30 | 32| 1024| 2048| 2.00000| 32| 3600| 1| 2|us |ready | jk | 19-11-18 16:48:19.867| 10| 1| 1| 10|365,365,365 | 1024| 4096| 16384| 4.00000| 100| 3600| 1| 2|us |ready | Query OK, 1 row(s) in set (0.002487s)
Query Super Table:
taos> show stables; name | created_time |columns| tags | tables | ==================================================================================================================== jiankong | 19-11-18 16:48:41.540| 7| 2| 7| Query OK, 1 row(s) in set (0.002140s)
Query the table structure of the supertable:
taos> describe jiankong; Field | Type | Length | Note | ======================================================================================================= ts |TIMESTAMP | 8| | gatewayid |BINARY | 6| | companyid |BINARY | 20| | provinceid |BINARY | 10| | cityid |BINARY | 10| | value |INT | 4| | timestr |BINARY | 30| | type |BINARY | 10|tag | subtype |BINARY | 10|tag | Query OK, 9 row(s) in set (0.001301s)
You can see "tag" in the Note column to indicate that this column is a tag
Query subtables:
taos> show tables; table_name | created_time |columns| stable | ================================================================================================================================================================= jiankong_sub_delivrd | 19-11-18 16:49:17.009| 7|jiankong | jiankong_sub_undeliv_ib_0008 | 19-11-18 16:49:17.025| 7|jiankong | jiankong_sub_undeliv_db_0108 | 19-11-18 16:49:17.016| 7|jiankong | jiankong_sub_undeliv_db_0107 | 19-11-18 16:49:17.018| 7|jiankong | jiankong_sub_undeliv_id_0076 | 19-11-18 16:49:17.023| 7|jiankong | jiankong_sub_send | 19-11-18 16:49:17.003| 7|jiankong | jiankong_sub_undeliv_balance | 19-11-18 16:49:17.021| 7|jiankong | Query OK, 10 row(s) in set (0.007001s)
Query the table structure of specific subtables:
taos> describe jiankong_sub_undeliv_db_0108; Field | Type | Length | Note | ========================================================================================================= ts |TIMESTAMP | 8| | gatewayid |BINARY | 6| | companyid |BINARY | 20| | provinceid |BINARY | 10| | cityid |BINARY | 10| | value |INT | 4| | timestr |BINARY | 30| | type |BINARY | 10|undeliv | subtype |BINARY | 10|DB:0108 | Query OK, 9 row(s) in set (0.001195s)
You can see "undeliv" (the type field in the super table) and "DB:0108" (the subtype field in the super table) in the Note column, and these two static tag values determine this subtable
6. Data Query
Group aggregated queries on type s:
taos> select sum(value) from jk.jiankong group by type; sum(value) | type | ================================= 11827688|delivrd | 55566578|send | 46687487|undeliv | Query OK, 3 row(s) in set (0.018251s)
Group aggregated queries on subtype s:
taos> taos> select sum(value) from jk.jiankong group by subtype; sum(value) | subtype | ================================= 9317|BALANCE | 65219|DB:0107 | 2077691|DB:0108 | 2804417|IB:0008 | 41730843|ID:0076 | 11827688|delivrd | 55566578|send | Query OK, 7 row(s) in set (0.013978s)
Group aggregated queries on types and subtype s:
taos> select sum(value) from jk.jiankong group by type, subtype; sum(value) | type | subtype | ============================================ 11827688|delivrd |delivrd | 55566578|send |send | 9317|undeliv |BALANCE | 65219|undeliv |DB:0107 | 2077691|undeliv |DB:0108 | 2804417|undeliv |IB:0008 | 41730843|undeliv |ID:0076 | Query OK, 7 row(s) in set (0.732830s)
Group and aggregate queries on types and subtype s by day:
taos> select sum(value) from jk.jiankong interval(1d) group by type, subtype; ts | sum(value) | type | subtype | ====================================================================== 19-11-18 00:00:00.000000| 1760800|delivrd |delivrd | 19-11-19 00:00:00.000000| 14768|delivrd |delivrd | 19-11-20 00:00:00.000000| 3290720|delivrd |delivrd | 19-11-21 00:00:00.000000| 4973640|delivrd |delivrd | 19-11-22 00:00:00.000000| 1787760|delivrd |delivrd | 19-11-18 00:00:00.000000| 36976790|send |send | 19-11-19 00:00:00.000000| 310128|send |send | 19-11-20 00:00:00.000000| 9482760|send |send | 19-11-21 00:00:00.000000| 6470940|send |send | 19-11-22 00:00:00.000000| 2325960|send |send | 19-11-18 00:00:00.000000| 6200|undeliv |BALANCE | 19-11-19 00:00:00.000000| 52|undeliv |BALANCE | 19-11-20 00:00:00.000000| 1590|undeliv |BALANCE | 19-11-21 00:00:00.000000| 1085|undeliv |BALANCE | 19-11-22 00:00:00.000000| 390|undeliv |BALANCE | 19-11-18 00:00:00.000000| 43400|undeliv |DB:0107 | 19-11-19 00:00:00.000000| 364|undeliv |DB:0107 | 19-11-20 00:00:00.000000| 11130|undeliv |DB:0107 | 19-11-21 00:00:00.000000| 7595|undeliv |DB:0107 | 19-11-22 00:00:00.000000| 2730|undeliv |DB:0107 | 19-11-18 00:00:00.000000| 1382600|undeliv |DB:0108 | 19-11-19 00:00:00.000000| 11596|undeliv |DB:0108 | 19-11-20 00:00:00.000000| 354570|undeliv |DB:0108 | 19-11-21 00:00:00.000000| 241955|undeliv |DB:0108 | 19-11-22 00:00:00.000000| 86970|undeliv |DB:0108 | 19-11-18 00:00:00.000000| 1866200|undeliv |IB:0008 | 19-11-19 00:00:00.000000| 15652|undeliv |IB:0008 | 19-11-20 00:00:00.000000| 478590|undeliv |IB:0008 | 19-11-21 00:00:00.000000| 326585|undeliv |IB:0008 | 19-11-22 00:00:00.000000| 117390|undeliv |IB:0008 | 19-11-18 00:00:00.000000| 27769800|undeliv |ID:0076 | 19-11-19 00:00:00.000000| 232908|undeliv |ID:0076 | 19-11-20 00:00:00.000000| 7121610|undeliv |ID:0076 | 19-11-21 00:00:00.000000| 4859715|undeliv |ID:0076 | 19-11-22 00:00:00.000000| 1746810|undeliv |ID:0076 | Query OK, 35 row(s) in set (0.023865s)
Interval here is the length of the aggregation time period, with a minimum interval of 10 milliseconds (10a)
Grouping aggregated queries on common tables without supertables will result in errors:
taos> select sum(value) from jk.jiankong group by type; TSDB error: invalid SQL: group by only available for STable query
7. Automatically create subtables when writing data
Another requirement is that we can't create all the subtables when building libraries or tables because of the uncertainty of hundreds of static data to be monitored.
The following is an excerpt from the official website:
In some special scenarios where the user is not sure if a device's table exists when writing data, the automatic table building syntax can be used to automatically create non-existent subtables using the table structure defined by the super table when writing data, and no new tables will be created if the table already exists.
Note: Auto-table statements can only automatically create subtables instead of supertables, which requires that supertables have been defined beforehand.The automatic table building syntax is very similar to insert/import syntax except that super table and label information are added to the statement.The specific syntax is as follows:
INSERT INTO <tb_name> USING <stb_name> TAGS (<tag1_value>, ...) VALUES (field_value, ...) (field_value, ...) ...;
Create subtables with create by comparing:
create table jiankong_sub_send using jiankong tags ('send', 'send');
3. Install and configure garafana
install
After downloading the rpm installation package for grafana at https://grafana.com/grafana/download on the official website, install it:
rpm -ivh grafana-6.4.4-1.x86_64.rpm
2. Grafana plugin for copy TDengine to Grafana's plugin directory
Grafana plug-in for TDengine is located in the installation package's/usr/local/taos/connector/grafana directory
cp -r /usr/local/taos/connector/grafana/tdengine/ /var/lib/grafana/plugins
3. Start Grafana
systemctl start grafana-server
4. Log on to the Grafana server via host:3000 in your browser
Default username and password are admin
5. Add TDengine Data Source
Find "TDengine" at the bottom
Name: "TDengine" (may be another name)
Host: Test server address "http://192.168.8.66:6020"
User: Default is "root"
Password: default is "taosdata"
Test it:
6. Add Folder
Put dashboard s of the same type in a Folder
7. Add Dashboard
Create Dashboard after entering the Folder you just created
sql statement at INPUT, note the location of fill, need to be in front of group by, otherwise error
Configure graphical display
Selection of graphs, tables, dashboards, etc. as required
Configure the Curve Chart here. Below the diagram are the specific graphic display details, such as marking, filling, customizing the color of the curve for the displayed fields, and so on.
Give this dashboard a name that you can easily read:
8. 6 dashboards configured
9. Instances without group by and with group by
Without group by, multiple SQLs need to be written, distinguished by where criteria, and more categorizations are cumbersome and inflexible
With group by, an sql solves the problem:
The following figure shows the same trend with or without group by
4. Preliminary Exploration of Advanced Functions and Some Associations
1. Continuous Query - User-level Precalculation
I think TDengine's concept of "predicted computing" is fantastic. The official documents are excerpted as follows:
To effectively improve the performance of query processing, TDengine uses each saved data block to record the maximum, minimum, and other statistics of the data in the data block for the immutable characteristics of the Internet of Things data.If the query process involves all the data for the entire data block, the predicted results are used directly, and the contents of the data block are no longer read.Since the size of the prediction module is much smaller than the size of the specific data stored on disk, using the prediction result can greatly reduce the read IO and speed up the query processing process for disk IO bottleneck queries.
The following is an excerpt of the continuously queried official documents:
Stream based on sliding window
Continuous queries are queries that are automatically executed periodically by TDengine and are computed using a sliding window, which is a simplified time-driven streaming computation.For tables or supertables in a library, TDengine provides continuous queries that are executed automatically on a regular basis. Users can either push the results of a query or write them back to TDengine.Each query executed is a time window that slides forward over time.When defining continuous queries, you need to specify the time window (parameter interval) size and forward sliding times (parameter sliding).
In this case, rewriting the results back to TDengine is actually a user-level precomputation, so TDengine does background calculations based on user-defined time windows and time increments and reads the data directly from the rewritten table when the user queries the data.
Create continuous queries:
taos> create table test_stream_sum as select sum(value) from jiankong interval(20s) sliding(10s) group by type, subtype; Query OK, 1 row(s) affected (0.000983s)
The actual output of the select section of sql for the above continuous queries is:
taos> select sum(value) from jiankong interval(20s) group by type, subtype; ts | sum(value) | type | subtype | ====================================================================== 19-11-18 16:50:40.000000| 9088|delivrd |delivrd | 19-11-18 16:51:00.000000| 31808|delivrd |delivrd | 19-11-18 16:51:20.000000| 15904|delivrd |delivrd | 19-11-18 16:52:20.000000| 12212|delivrd |delivrd | 19-11-18 16:52:40.000000| 31524|delivrd |delivrd | 19-11-18 16:53:00.000000| 31524|delivrd |delivrd | 19-11-18 16:53:20.000000| 31808|delivrd |delivrd | 19-11-18 16:53:40.000000| 31240|delivrd |delivrd | 19-11-18 16:54:00.000000| 31524|delivrd |delivrd | 19-11-18 16:54:20.000000| 31524|delivrd |delivrd | 19-11-18 16:54:40.000000| 31240|delivrd |delivrd | 19-11-18 16:55:00.000000| 31524|delivrd |delivrd | 19-11-18 16:55:20.000000| 28400|delivrd |delivrd | 19-11-18 16:55:40.000000| 31808|delivrd |delivrd | 19-11-18 16:56:00.000000| 31524|delivrd |delivrd | 19-11-18 16:56:20.000000| 31240|delivrd |delivrd | 19-11-18 16:56:40.000000| 31524|delivrd |delivrd | 19-11-18 16:57:00.000000| 32092|delivrd |delivrd | 19-11-18 16:57:20.000000| 31240|delivrd |delivrd | 19-11-18 16:57:40.000000| 32092|delivrd |delivrd | 19-11-18 16:58:00.000000| 31240|delivrd |delivrd | 19-11-18 16:58:20.000000| 22720|delivrd |delivrd | 19-11-18 16:50:40.000000| 190848|send |send |
The actual data in the automatically created table of continuous queries is:
taos> select * from test_stream_sum; ts | sum_value_ | ================================================ 19-11-18 17:17:30.000000| 2556| 19-11-18 17:17:40.000000| 18460| 19-11-18 17:17:50.000000| 15904| 19-11-18 17:18:00.000000| 15620| Query OK, 4 row(s) in set (0.000431s)
The above results are not expected and do not display aggregated queries according to the group by field I defined.
github Issues, taos'attacking lion replies,'Continuous queries are not currently very good at supporting group by, and this problem is already on our list of plans and will be improved in the future', because the new table may have the same timestamp primary key (different group by fields will have the same time) and will conflictThat's why it's temporarily not supported.
Although this feature does not meet my expectations at present, the siege lion's reply is still a comfort to chat with, and I look forward to its improvement.
2. Lenovo
TDengine is designed to serve the Internet of Things and is structured and sequential. It is recommended that data with close sequence be available, while other data is not.
From TDengine's architecture design, storage, and so on, I feel that it is not limited to 100% timing. Some detailed queries can also be stored in TDengine and will be tested later.