Using Flink Metric Reporter to monitor flink task indicators

Since flink1.8, the reporter supports writing index data to influxdb, and users can read the data in influxdb for visualization.

But for small and medium-sized companies, most of them don't choose self-developed visualization because of the cost. We choose grafana for the visualization of flick metrics.

In this paper, we focus on the Reporter of influxdb and prometheus, write the metrics data of flink into the external system, and use grafana for visualization.

Installation and configuration mode: handle teaching, as follows:

1. influxdb

1.1 start up

docker run -p 8086:8086 \
    -v /data/docker_volume/influxdb:/var/lib/influxdb \
    influxdb

1.2 connect to influxdb

docker exec -it e9b352ee20d4 influx

1.3 building database

create database flink

1.4 building users

create user "flink" with password 'flink#123centos' with all privileges;

2. Prometheus

2.1 download prometheus and pushgateway

https://prometheus.io/download/

2.2 installation

Decompress prometheus and pushgateway respectively

2.3 configuration

vim prometheus.yml 
//Add at the end:

  # pushgateway
  - job_name: 'pushgateway'
    scrape_interval: 10s
    honor_labels: true #Adding some labels in the uploaded data of the exporter node of this configuration will not be overwritten by the same labels of the pushgateway node 
    static_configs:
     - targets: ['localhost:9091']
       labels:
         instance: pushgateway

2.4 start up

./prometheus  > /dev/null 2>&1 &

./pushgateway --web.enable-admin-api > /dev/null 2>&1 &
Parameters-- web.enable -Admin API, which means to enable the management of data through webapi. You can delete metrics in webUI or use the command curl-x put http://localhost : 9091 / API / V1 / admin / wire delete all metrics

2.5 verification

# prometheus:
Open http://10.42.63.116:9090/targets

You can see pushgateway in targets, as shown below:

# pushgateway:
open http://10.42.63.116:9091/

You can see the monitoring indicator data written by flink (need to restart the flink task)

3. flink configuration

3.1 modify the flink configuration file

vim flink-1.10.0/conf/flink-conf.yaml 

# Configure influxdb
metrics.reporter.influxdb.class: org.apache.flink.metrics.influxdb.InfluxdbReporter
metrics.reporter.influxdb.host: 10.42.63.116
metrics.reporter.influxdb.port: 8086
# DB, username and password should be consistent with the configuration of influxdb
metrics.reporter.influxdb.db: flink
metrics.reporter.influxdb.username: flink
metrics.reporter.influxdb.password: flink#123centos
#metrics.reporter.influxdb.retentionPolicy: one_hour
#metrics.reporter.influxdb.consistency: ANY 
#metrics.reporter.influxdb.connectTimeout: 60000
#metrics.reporter.influxdb.writeTimeout: 60000

# Configure prometheus
metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.host: 10.42.63.116
metrics.reporter.promgateway.port: 9091 
# jobName is specified directly, and does not need to be configured in prometheus in advance
metrics.reporter.promgateway.jobName: tdflink_prom
metrics.reporter.promgateway.randomJobNameSuffix: true
# Whether to delete the metrics stored in prometheus after the flink task is closed is false by default, but when it is set to true, it cannot be deleted effectively. For details, see https://issues.apache.org/jira/browse/FLINK-11457 , which can be deleted through the webUI or api of pushgateway
metrics.reporter.promgateway.deleteOnShutdown: true

# Collect operating system metrics
# Flag indicating whether Flink should report system resource metrics such as machine's CPU, memory or network usage.
metrics.system-resource: true

3.2 copy jar package

Copy the corresponding jar packages of influxdb and prometheus from flink-1.10.0/opt to the lib directory

cp opt/flink-metrics-influxdb-1.10.0.jar ./lib
cp opt/flink-metrics-prometheus-1.10.0.jar ./lib

Metric reporter reports operating system indicators, downloads jar package and uploads it to lib directory

jna-4.2.2.jar
jna-platform-4.2.2.jar
oshi-core-3.4.0.jar

3.3 start the flink task

# yarn-single-job
/home/admin/flink-1.10.0/bin/flink run -m yarn-cluster -p 100 -yjm 4g -ys 10 -ytm 16g -yqu root.flink -ynm etl_test  \
/home/admin/tiangx/applog_etl/jar_test/applog_etl-1.0-SNAPSHOT-jar-with-dependencies.jar \
--input-topic applog_raw \
--output-topic applog_test \
--bootstrap.servers 10.19.171.177:9092 \
--zookeeper.connect 10.19.171.177:2181 \
--group.id flink_applog_etl_test \
--redis 10.10.152.217 > /dev/null 2>&1 &

3.4 clear historical metrics in prometheus

When the flink task is restarted, it cannot automatically clear the historical metrics in prometheus, which affects the monitoring experience (you will see the stopped tasks). It is recommended to manually clear them in the following two ways:

3.4.1 delete all metrics through the webUI of pushgateway:

3.4.2 delete metrics through the pushgateway api:

curl -X PUT http://localhost:9091/api/v1/admin/wipe

4. Grafana

4.1 install and start grafana

Download:
docker pull grafana/grafana

//Start:
docker run -d --name=grafana -p 3000:3000 grafana/grafana

Open grafana for the first time, http://localhost:3000/,
Click skip to skip password verification. The second time you open grafana, you need password verification. The default user is admin, and the password is admin. After you log in, you will be prompted to change the password.

4.2 configure data source

As shown below:

Configure influxdb:

Configure prometheus:

4.3 download the grafana template

https://grafana.com/grafana/dashboards
Search for the flick metrics template and download it

4.3 import template to grafana

Click "import" to import the downloaded template, and then open the dashboard:

The downloaded grafana dashboard may need to be adjusted again to display correctly.

Tags: InfluxDB Docker Apache Database

Posted on Sat, 20 Jun 2020 01:41:30 -0400 by lurius