Flume study notes

Official documents ...
Core components
Flume installation prerequisites
install
Demand 1
Demand 2
Demand 3

Core components

  1. Source collection

  2. Channel aggregation

  3. Sink output

Flume installation prerequisites

  • Java Runtime Environment - Java 1.8 or later
  • Memory - Sufficient memory for configurations used by sources, channels or sinks
  • Disk Space - Sufficient disk space for configurations used by channels or sinks
  • Directory Permissions - Read/Write permissions for directories used by agent

install

  1. Install jdk
  2. Download and extract to user directory
  3. Configure environment variables
    export FLUME_HOME="/Users/gaowenfeng/Documents/bigdata/flume" export PATH=$FLUME_HOME/bin:$PATH
  4. Make it effective under source
  5. Configuration of flume-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home
  1. Detect $flume_home / bin / flume ng version

Demand 1

The key to using flume is to write configuration files
  1. Configure Source
  2. Configure Channel
  3. Configure Sink
  4. Put these three components on
# example.conf: A single-node Flume configuration # a1 agent name # Name of R1 sure # The name of k1 sink # c1 channel name # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
Start Agent
flume-ng agent \ --name a1 \ --conf conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/example.conf \ -Dflume.root.logger=INFO,console
Test with telnet
telnet host ip
Event: { headers:{} body: 68 65 6C 6C 6F 0D hello. } Event yes Flume Basic unit of data transmission Event = Optional header+byte array

Demand 2

Agent selection: exec source +memory channel+logger sink

exex-memory-logger.conf

# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F /Users/gaowenfeng/data/data.log a1.sources.r1.shell = /bin/sh -c # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

Start agent

flume-ng agent \ --name a1 \ --conf conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/exex-memory-logger.conf \ -Dflume.root.logger=INFO,console

Demand 3

Technology selection:

exec source + memory channel + avro sink avro source + memory channel + logger sink

exec-memory-avro.conf

# Name the components on this agent exec-memory-avro.sources = exec-source exec-memory-avro.sinks = avro-sink exec-memory-avro.channels = memory-channel # Describe/configure the source exec-memory-avro.sources.exec-source.type = exec exec-memory-avro.sources.exec-source.command = tail -F /Users/gaowenfeng/data/data.log exec-memory-avro.sources.exec-source.shell = /bin/sh -c # Describe the sink exec-memory-avro.sinks.avro-sink.type = avro exec-memory-avro.sinks.avro-sink.hostname = localhost exec-memory-avro.sinks.avro-sink.port = 44444 # Use a channel which buffers events in memory exec-memory-avro.channels.memory-channel.type = memory # Bind the source and sink to the channel exec-memory-avro.sources.exec-source.channels = memory-channel exec-memory-avro.sinks.avro-sink.channel = memory-channel

avro-memory-logger.conf

# Name the components on this agent avro-memory-logger.sources = avro-source avro-memory-logger.sinks = logger-sink avro-memory-logger.channels = memory-channel # Describe/configure the source avro-memory-logger.sources.avro-source.type = avro avro-memory-logger.sources.avro-source.bind = localhost avro-memory-logger.sources.avro-source.port = 44444 # Describe the sink avro-memory-logger.sinks.logger-sink.type = logger # Use a channel which buffers events in memory avro-memory-logger.channels.memory-channel.type = memory # Bind the source and sink to the channel avro-memory-logger.sources.avro-source.channels = memory-channel avro-memory-logger.sinks.logger-sink.channel = memory-channel

Start agent first

flume-ng agent \ --name avro-memory-logger \ --conf conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/avro-memory-logger.conf \ -Dflume.root.logger=INFO,console

Restart

flume-ng agent \ --name exec-memory-avro \ --conf conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/exec-memory-avro.conf \ -Dflume.root.logger=INFO,console

Log collection process

  1. Which file is monitored on machine A? When we access the master station, the user behavior log will be recorded in access.log
  2. avro sin outputs the newly generated log to the hostname port specified by the corresponding avro source
  3. Output the log to the corresponding console [kafaka] through the agent corresponding to avro source

5 May 2020, 23:01 | Views: 7362

Add new comment

For adding a comment, please log in
or create account

0 comments