Linux下Flume的安装 | 码农家园

Linux下Flume的安装

一、前置条件

Flume 需要依赖 JDK 1.8+，JDK 参考：

Linux下JDK的安装

二、安装步骤

2.1 下载并解压

下载所需版本的 Flume，这里我下载的是 Apache 版本的 Flume。下载地址为：http://www.apache.org/dyn/closer.lua/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz

1 2	# 下载后进行解压 tar -zxvf apache-flume-1.9.0-bin.tar.gz -C /opt/

2.2 配置环境变量

1	# vim /etc/profile

添加环境变量：

1 2	export FLUME_HOME=/opt/apache-flume-1.9.0-bin export PATH=$FLUME_HOME/bin:$PATH

使得配置的环境变量立即生效：

1	# source /etc/profile

2.3 修改配置

进入安装目录下的 conf/ 目录，拷贝 Flume 的环境配置模板 flume-env.sh.template：

1	# cp flume-env.sh.template flume-env.sh

修改 flume-env.sh,指定 JDK 的安装路径：

1 2	# Enviroment variables can be set here. export JAVA_HOME=/opt/jdk1.8.0_181

2.4 验证

由于已经将 Flume 的 bin 目录配置到环境变量，直接使用以下命令验证是否配置成功：

1	# flume-ng version

出现对应的版本信息则代表配置成功。

1
2
3
4
5

Flume 1.9.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: d4fcab4f501d41597bc616921329a4339f73585e
Compiled by fszabo on Mon Dec 17 20:45:25 CET 2018
From source with checksum 35db629a3bda49d23e9b3690c80737f9

三、测试使用

3.1 flume 从文件导数据入kafka

flume-kafka.conf：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

a1.sources = s1
a1.channels = c1
a1.sinks = k1

a1.sources.s1.type=exec
a1.sources.s1.command=tail -n0 -F /opt/gzgtest/flumekafka/kafka.log
a1.sources.s1.channels=c1

#设置Kafka接收器
a1.sinks.k1.type= org.apache.flume.sink.kafka.KafkaSink
#设置Kafka地址
a1.sinks.k1.brokerList=192.168.73.130:9092,192.168.73.131:9092,192.168.73.132:9092
#设置发送到Kafka上的主题
a1.sinks.k1.topic=test
#设置序列化方式
a1.sinks.k1.serializer.class=kafka.serializer.StringEncoder
a1.sinks.k1.channel=c1

a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=100

启动：

1	nohup flume-ng agent -c /opt/apache-flume-1.9.0-bin/conf -f /data/flume/flume-kafka.conf -n a1 -Dflume.root.logger=INFO,console > /data/flume/nohup.out 2>&1 &

可以启动kafka-console-consumer.sh 观看消费情况

3.1 flume 从文件导数据入HDFS

flume-hdfs.conf：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -n0 -F /opt/gzgtest/flumekafka/kafka.log

# the file
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 100

# define channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://gzgtest/user/hive/warehouse/tmp.db/rt_minipc_dfh_goodnews/dt=20200701

a1.sinks.k1.hdfs.filePrefix = 192.168.73.132_log_%Y%m%d%H%M
a1.sinks.k1.hdfs.inUsePrefix = .
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 600
a1.sinks.k1.hdfs.minBlockReplicas=1
a1.sinks.k1.hdfs.batchDurationMillis = 10000
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.threadsPoolSize = 250
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.callTimeout = 120000
a1.sinks.k1.hdfs.idleTimeout = 600
a1.sinks.k1.hdfs.rollTimerPoolSize = 10

启动：

1	nohup flume-ng agent -c /opt/apache-flume-1.9.0-bin/conf -f /data/flume/flume-hdfs.conf -n a1 -Dflume.root.logger=INFO,console > /data/flume/nohup.out 2>&1 &

注意：
以上我都使用hadoop用户执行、可能会需要文件权限问题、把所有需要执行的和flume文件夹都改为hadoop:hadoop

可能遇到的问题:

1 2	2020-07-01 16:54:47,501 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)

解决方案参考: flume 运行时报NoSuchMethodError: com.google.common.base.Preconditions.checkArgument

参考:
Linux下Flume的安装

官网需要多看:
FlumeUserGuide

flume配置参数的意义