Introduction to CAT minimalism (application monitoring platform)

1. Overview

Lazy tips: CAT official documents - simplicity It's been written very well. Here's a great way to use direct CV~

CAT(Central Application Tracking) It is a real-time application monitoring platform developed based on Java, which provides a comprehensive real-time monitoring and alarm service for meituan review.

As the basic component of the server project, CAT provides Java, C/C++, Node.js, Python, Go and other multilingual clients have been deeply integrated in the infrastructure middleware framework (MVC framework, RPC framework, database framework, cache framework, message queue, configuration system, etc.) of meituan reviews, providing various business lines of meituan reviews with rich performance indicators, health status, real-time alarms, etc.

The great advantage of CAT is that it is a real-time system. Most of the systems of CAT are minute level statistics, but from data generation to the end of server processing is second level. The definition of second level is 48 minutes and 40 seconds. Basically, we can see 48 minutes and 38 seconds of data. The statistical granularity of the overall report is minute level;

The second advantage is that the monitoring data is full statistics and the client pre calculation; the link data is sampling calculation.

1.1 CAT product value

Reduce fault discovery time
Reduce the cost of fault location
Secondary application optimization

1.2 CAT advantages

Real time processing: the value of information will sharply decrease with time, especially in the process of accident processing
Full data: collect indicator data in full quantity for in-depth analysis of fault cases
High availability: fault recovery and problem location need high availability monitoring to support
Fault tolerance: the fault does not affect the normal operation of the business and is transparent to the business
High throughput: the collection of massive monitoring data needs high throughput capacity
Scalable: it supports distributed, cross IDC deployment and horizontally expanded monitoring system

2. Single machine deployment

In this section, we will refer to Official CAT document - cluster deployment In this paper, to minimize the deployment of a CAT service, suitable for demonstration, learning, testing environment.

In a stand-alone deployment, we do not need to deploy Hadoop environment, but directly use disk to store CAT monitoring data. Therefore, it is relatively simple. Next, we use CentOS 7.X to deploy the CAT server.

Linux 2.6 and above (epoll can only be supported with 2.6 kernel). For online server deployment, please use Linux environment, Mac and Windows environment as development environment. Meituan reviews internal CentOS 6.5.
Hadoop environment is optional. Generally, it is recommended that smaller companies directly use disk mode. They can apply for CAT server, 500GB disk or larger disk, which is mounted on / data / directory.

Note that the internal IP of the server is 172.16.48.185. Pay attention to replacing the internal IP of the server by the fat friend!!!

2.1 download

CAT has officially provided the war package of CAT server, so we can directly execute the following command to download:

# Download war package $ wget http://unidal.org/nexus/service/local/repositories/releases/content/com/dianping/cat/cat-home/3.0.0/cat-home-3.0.0.war # Rename to cat.war $ mv cat-home-3.0.0.war cat.war

2.2 configuration

Before starting the CAT server, we need to do some configuration.

① Put the script/CatApplication.sql Script, initializing into the database. After importing, the data table is as follows:

② Create a CAT catalog and grant permissions. Execute the following command:

# Create CAT configuration catalog $ mkdir -p /data/appdatas/cat # Create CAT log directory $ mkdir -p /data/applogs/cat # Grant authority $ chmod 777 /data/appdatas/cat -R $ chmod 777 /data/applogs/cat -R

③ In the / data / appdata / CAT directory, create the CAT client configuration file client.xml . The details are as follows:

The function of this configuration file is that all clients need an address to point to the service side of CAT.
This file can be deployed and maintained uniformly through O & M, such as using operation and maintenance tools such as puppet.
Different environments have different files, such as distinguishing prod environment and test environment. There are two sets of environment CAT in meituan comments, one is production environment and the other is test environment.
Note: when the route is wrong, and there is a client in the folder_ cache.xml , please delete the client_cache.xml , and restart the service.

<?xml version="1.0" encoding="utf-8"?> <config mode="client"> <servers>  <server ip="172.16.48.185" port="2280" http-port="8080"/>   </servers> </config>

2280 is the default port for CAT server to accept data and cannot be modified.
HTTP port is the port started by Tomcat. The default is 8080. It is recommended to use the default port.

In this case, we have a stand-alone deployment, so there are no other servers in the cluster.

④ In the / data / appdata / CAT directory, create the CAT server configuration file server.xml . The details are as follows:

<?xml version="1.0" encoding="utf-8"?>  <config local-mode="false" hdfs-machine="false" job-machine="true" alert-machine="true"> <storage local-base-dir="/data/appdatas/cat/bucket/" max-hdfs-storage-time="15" local-report-storage-time="7" local-logivew-storage-time="7"> </storage> <console default-domain="Cat" show-cat-domain="true">  <remote-servers>172.16.48.185:8080</remote-servers> </console> </config>

😈 Friendly tip: the introduction of the following server and storage models is a bit lengthy, which can be ignored immediately~

Server model: represents the configuration of a machine. If id is default, it represents the default configuration; if id is IP, it represents the configuration of this server.

property local mode: defines whether the service is a local mode (development mode). In the production environment, set to false to start the remote listening mode. The default is false.
property HDFS machine: defines whether to enable HDFS storage mode. The default is false.
property job machine: defines whether the current service is a report machine (only one server is required to enable the task of generating summary report and statistical report), which is false by default.
property alarm machine: defines whether the current service is an alarm machine (only one service machine needs to turn on this function when all kinds of alarm monitoring are turned on). The default value is false.
property send machine: defines whether the current service alarm is sent (in order to solve the problem that the alarm thread is started in the test environment, but the alarm is not notified at last, this configuration will be gradually removed later. It is recommended that when alarm machine is turned on to true, this synchronization is true).

Storage model: define data storage configuration information.

property local report storage time: defines the storage time of local reports in days.
property local logivew storage time: defines the local log storage time in days.
property local base dir: defines the local data store directory, which is the path to the source file found when uploading to HDFS.
property HDFS: defines the HDFS configuration information for direct login to the system.
property server URI: defines the HDFS service address and supports the configuration of HDFS Nameservice.
property console: defines the service console information.
property remote-servers: define the list of HTTP services.
LDAP: define LDAP configuration information (this can be ignored).
ldapUrl: defines the LDAP service address (this can be ignored).

In this case, we only need to fill in the intranet IP of the current CAT server.

⑤ In the / data / appdata / CAT directory, create the CAT database configuration file datasources.xml . The details are as follows:

<?xml version="1.0" encoding="utf-8"?> <data-sources> <data-source id="cat"> <maximum-pool-size>3</maximum-pool-size> <connection-timeout>1s</connection-timeout> <idle-timeout>10m</idle-timeout> <statement-cache-size>1000</statement-cache-size> <properties> <driver>com.mysql.jdbc.Driver</driver>  <url><![CDATA[jdbc:mysql://rm-uf60u8c6vnfx2q4m4.mysql.rds.aliyuncs.com:3306/demo_cat]]></url>  <user>demo_cat</user>  <password>Wwb626583</password> <connectionProperties><![CDATA[useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&socketTimeout=120000]]></connectionProperties> </properties> </data-source> </data-sources>

2.3 start up

① Because CAT server provides war package, we need to download Tomcat. We use Tomcat 9.X version here. Execute the command as follows:

# download $ wget http://mirror.cc.columbia.edu/pub/software/apache/tomcat/tomcat-9/v9.0.35/bin/apache-tomcat-9.0.35.tar.gz # decompression $ tar -zxvf apache-tomcat-9.0.35.tar.gz

Tomcat is recommended for J2EE containers, and 7.X.X or 8.0 is recommended.

② Copy the war package provided by the CAT server to the webapps directory of Tomcat. Execute the command as follows:

$ cp cat.war apache-tomcat-9.0.35/webapps/

③ Modify Tomcat's server configuration file server.xml The code of is UTF-8 to avoid the problem of random code. The revised contents are as follows:

$ vi apache-tomcat-9.0.35/conf/server.xml <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="utf-8" />

④ Install openjdk version 1.8 by executing the command yum install java-1.8.0-openjdk.

Java 6, 7, 8, the server recommends the jdk7 version.

The client jdk6, 7 and 8 all support it.

⑤ Start the Tomcat server by executing sh apache-tomcat-9.0.35/bin/startup.sh Command.

2.4 secondary configuration

① After startup, access http://172.16.48.185:8080/cat Address, enter the landing page of CAT console. The built-in administrator account of CAT is "admin/admin". Enter the home page of CAT console.

Friendly tip: maybe CAT will start slowly, so fat friends can keep refreshing~

At this time, we see that the page reports the error of "service side of CAT with problem: [127.0.0.1]", which needs to be solved.

② Click the "Configs" menu above, and then select the "global system configuration - > client routing" menu on the left. The modified content is shown in the figure below, and click the "submit" button:

<?xml version="1.0" encoding="utf-8"?> <router-config backup-server="172.16.48.185" backup-server-port="2280"> <default-server id="172.16.48.185" weight="1.0" port="2280" enable="true"/> <network-policy id="default" title="default" block="false" server-group="default_group"> </network-policy> <server-group id="default_group" title="default-group"> <group-server id="172.16.48.185"/> </server-group> <domain id="cat"> <group id="default"> <server id="172.16.48.185" port="2280" weight="1.0"/> </group> </domain> </router-config>

③ Continue to click the menu of "global system configuration - > server configuration" on the left, and modify the contents as shown in the figure below, and click the "submit" button:

<?xml version="1.0" encoding="utf-8"?> <server-config> <server id="default"> <properties> <property name="local-mode" value="false"/> <property name="job-machine" value="false"/> <property name="send-machine" value="false"/> <property name="alarm-machine" value="false"/> <property name="hdfs-enabled" value="false"/> <property name="remote-servers" value="172.16.48.185:8080"/> </properties> <storage local-base-dir="/data/appdatas/cat/bucket/" max-hdfs-storage-time="15" local-report-storage-time="2" local-logivew-storage-time="1" har-mode="true" upload-thread="5"> <hdfs id="dump" max-size="128M" server-uri="hdfs://127.0.0.1/" base-dir="/user/cat/dump"/> <harfs id="dump" max-size="128M" server-uri="har://127.0.0.1/" base-dir="/user/cat/dump"/> <properties> <property name="hadoop.security.authentication" value="false"/> <property name="dfs.namenode.kerberos.principal" value="hadoop/dev80.hadoop@testserver.com"/> <property name="dfs.cat.kerberos.principal" value="cat@testserver.com"/> <property name="dfs.cat.keytab.file" value="/data/appdatas/cat/cat.keytab"/> <property name="java.security.krb5.realm" value="value1"/> <property name="java.security.krb5.kdc" value="value2"/> </properties> </storage> <consumer> <long-config default-url-threshold="1000" default-sql-threshold="100" default-service-threshold="50"> <domain name="cat" url-threshold="500" sql-threshold="500"/> <domain name="OpenPlatformWeb" url-threshold="100" sql-threshold="500"/> </long-config> </consumer> </server> <server id="172.16.48.185"> <properties> <property name="job-machine" value="true"/> <property name="send-machine" value="true"/> <property name="alarm-machine" value="true"/> </properties> </server> </server-config>

④ After the modification, restart the Tomcat server for the configuration of the new CAT server to take effect. After the restart is complete, access http://172.16.48.185:8080/cat Address, will not see the CAT console is not wrong.

Friendly tip: maybe CAT will start slowly, so fat friends can keep refreshing~

3. Cluster deployment

See the following article to complete the CAT cluster deployment:

Official CAT document - cluster deployment

Here, I'd like to be lazy for a while. Grandma wants to focus on the access of CAT client!

4. Model design

Before starting to access the CAT client in the application, we need to see Official CAT documents - model design In this paper, four monitoring models of CAT are understood: Transaction, Event, Heartbeat, Metric.

4.1 model 1: Transaction

It is suitable for recording program access behavior across system boundary, such as remote call, database call, and long-time business logic monitoring.

Transaction is used to record the execution time and times of a piece of code.

Further understanding, we can see Official CAT document Transaction report.

4.2 model II: Event

Used to record the number of times an event occurs, such as system exceptions.

Compared with Transaction, it lacks time statistics and costs less than Transaction.

Further understanding, we can see Official CAT documents - Event Report.

In addition, CAT generates the CAT document - Problem report , fat friends can have a look~

4.3 model 3: Heartbeat

Represents statistics generated periodically within a program, such as CPU utilization, memory utilization, connection pool status, system load, and so on.

Further understanding, we can see Official CAT document - Heartbeat Report.

4.4 model IV: Metric

It is used to record business indicators. Indicators may include the number of records, the average value of records, and the total number of records for an indicator. The minimum statistical granularity of business indicators is 1 minute.

Further understanding, we can see CAT official document Business report.

4.5 message tree

The CAT monitoring system encapsulates the internal execution of each URL and Service request into a complete message tree. The message tree may include Transaction, Event, Heartbeat, Metric and other information.

① Complete message tree

② Visual message tree

③ Distributed message tree [one machine calls another machine]

5. Application access

reference resources Introduction to Spring Boot monitoring platform CAT In this paper, the CAT client is connected to the application to realize the monitoring function.

6. Alarm

reference resources CAT official document - alarm configuration This paper realizes the alarm function of CAT.

At the same time, you can refer to cat-alert Integrated mode, realize CAT WeChat alarm.