Core components
-
Source collection
-
Channel aggregation
-
Sink output
Flume installation prerequisites
- Java Runtime Environment - Java 1.8 or later
- Memory - Sufficient memory for configurations used by sources, channels or sinks
- Disk Space - Sufficient disk space for configurations used by channels or sinks
- Directory Permissions - Read/Write permissions for directories used by agent
install
- Install jdk
- Download and extract to user directory
- Configure environment variables
export FLUME_HOME="/Users/gaowenfeng/Documents/bigdata/flume" export PATH=$FLUME_HOME/bin:$PATH
- Make it effective under source
- Configuration of flume-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home
- Detect $flume_home / bin / flume ng version
Demand 1
The key to using flume is to write configuration files- Configure Source
- Configure Channel
- Configure Sink
- Put these three components on
# example.conf: A single-node Flume configuration # a1 agent name # Name of R1 sure # The name of k1 sink # c1 channel name # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1Start Agent
flume-ng agent \ --name a1 \ --conf conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/example.conf \ -Dflume.root.logger=INFO,consoleTest with telnet
telnet host ip
Event: { headers:{} body: 68 65 6C 6C 6F 0D hello. } Event yes Flume Basic unit of data transmission Event = Optional header+byte array
Demand 2
Agent selection: exec source +memory channel+logger sink
exex-memory-logger.conf
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F /Users/gaowenfeng/data/data.log a1.sources.r1.shell = /bin/sh -c # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
Start agent
flume-ng agent \ --name a1 \ --conf conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/exex-memory-logger.conf \ -Dflume.root.logger=INFO,console
Demand 3
Technology selection:
exec source + memory channel + avro sink avro source + memory channel + logger sink
exec-memory-avro.conf
# Name the components on this agent exec-memory-avro.sources = exec-source exec-memory-avro.sinks = avro-sink exec-memory-avro.channels = memory-channel # Describe/configure the source exec-memory-avro.sources.exec-source.type = exec exec-memory-avro.sources.exec-source.command = tail -F /Users/gaowenfeng/data/data.log exec-memory-avro.sources.exec-source.shell = /bin/sh -c # Describe the sink exec-memory-avro.sinks.avro-sink.type = avro exec-memory-avro.sinks.avro-sink.hostname = localhost exec-memory-avro.sinks.avro-sink.port = 44444 # Use a channel which buffers events in memory exec-memory-avro.channels.memory-channel.type = memory # Bind the source and sink to the channel exec-memory-avro.sources.exec-source.channels = memory-channel exec-memory-avro.sinks.avro-sink.channel = memory-channel
avro-memory-logger.conf
# Name the components on this agent avro-memory-logger.sources = avro-source avro-memory-logger.sinks = logger-sink avro-memory-logger.channels = memory-channel # Describe/configure the source avro-memory-logger.sources.avro-source.type = avro avro-memory-logger.sources.avro-source.bind = localhost avro-memory-logger.sources.avro-source.port = 44444 # Describe the sink avro-memory-logger.sinks.logger-sink.type = logger # Use a channel which buffers events in memory avro-memory-logger.channels.memory-channel.type = memory # Bind the source and sink to the channel avro-memory-logger.sources.avro-source.channels = memory-channel avro-memory-logger.sinks.logger-sink.channel = memory-channel
Start agent first
flume-ng agent \ --name avro-memory-logger \ --conf conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/avro-memory-logger.conf \ -Dflume.root.logger=INFO,console
Restart
flume-ng agent \ --name exec-memory-avro \ --conf conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/exec-memory-avro.conf \ -Dflume.root.logger=INFO,console
Log collection process
- Which file is monitored on machine A? When we access the master station, the user behavior log will be recorded in access.log
- avro sin outputs the newly generated log to the hostname port specified by the corresponding avro source
- Output the log to the corresponding console [kafaka] through the agent corresponding to avro source