Flume study notes

Official documents

Core components

  1. Source collection

  2. Channel aggregation

  3. Sink output

Flume installation prerequisites

  • Java Runtime Environment - Java 1.8 or later
  • Memory - Sufficient memory for configurations used by sources, channels or sinks
  • Disk Space - Sufficient disk space for configurations used by channels or sinks
  • Directory Permissions - Read/Write permissions for directories used by agent

install

  1. Install jdk
  2. Download and extract to user directory
  3. Configure environment variables
    export FLUME_HOME="/Users/gaowenfeng/Documents/bigdata/flume"
    export PATH=$FLUME_HOME/bin:$PATH
    
  4. Make it effective under source
  5. Configuration of flume-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home
  1. Detect $flume_home / bin / flume ng version

Demand 1

The key to using flume is to write configuration files

  1. Configure Source
  2. Configure Channel
  3. Configure Sink
  4. Put these three components on
# example.conf: A single-node Flume configuration

# a1 agent name
# Name of R1 sure
# The name of k1 sink
# c1 channel name

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory


# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Start Agent

flume-ng agent \
--name a1 \
--conf conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/example.conf \
-Dflume.root.logger=INFO,console

Test with telnet

telnet host ip
Event: { headers:{} body: 68 65 6C 6C 6F 0D                               hello. }
Event yes Flume Basic unit of data transmission
Event = Optional header+byte array

Demand 2

Agent selection: exec source +memory channel+logger sink

exex-memory-logger.conf


# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /Users/gaowenfeng/data/data.log
a1.sources.r1.shell = /bin/sh -c

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory


# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Start agent

flume-ng agent \
--name a1 \
--conf conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exex-memory-logger.conf \
-Dflume.root.logger=INFO,console

Demand 3

Technology selection:

    exec source + memory channel + avro sink
    avro source + memory channel + logger sink

exec-memory-avro.conf


# Name the components on this agent
exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel

# Describe/configure the source
exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /Users/gaowenfeng/data/data.log
exec-memory-avro.sources.exec-source.shell = /bin/sh -c

# Describe the sink
exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = localhost
exec-memory-avro.sinks.avro-sink.port = 44444

# Use a channel which buffers events in memory
exec-memory-avro.channels.memory-channel.type = memory


# Bind the source and sink to the channel
exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel

avro-memory-logger.conf


# Name the components on this agent
avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel

# Describe/configure the source
avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = localhost
avro-memory-logger.sources.avro-source.port = 44444

# Describe the sink
avro-memory-logger.sinks.logger-sink.type = logger

# Use a channel which buffers events in memory
avro-memory-logger.channels.memory-channel.type = memory


# Bind the source and sink to the channel
avro-memory-logger.sources.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel

Start agent first

flume-ng agent \
--name avro-memory-logger \
--conf conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/avro-memory-logger.conf \
-Dflume.root.logger=INFO,console

Restart

flume-ng agent \
--name exec-memory-avro \
--conf conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exec-memory-avro.conf \
-Dflume.root.logger=INFO,console

Log collection process

  1. Which file is monitored on machine A? When we access the master station, the user behavior log will be recorded in access.log
  2. avro sin outputs the newly generated log to the hostname port specified by the corresponding avro source
  3. Output the log to the corresponding console [kafaka] through the agent corresponding to avro source

Tags: Java JDK shell Kafaka

Posted on Tue, 05 May 2020 23:01:27 -0400 by AustinP