Flume消费Kafka,启动后无法正常关闭Flume进程的问题

在学习Flume时,有遇到Flume来消费Kafka数据的场景,所以自己学习着做了一些配置。Flume的配置文件如下:

## 组件
a1.sources=r1 r2
a1.channels=c1 c2
a1.sinks=k1 k2

## source1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = hadoop102:9092,hadoop103:9092,hadoop104:9092
a1.sources.r1.kafka.topics=topic_start

## source2
a1.sources.r2.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r2.batchSize = 5000
a1.sources.r2.batchDurationMillis = 2000
a1.sources.r2.kafka.bootstrap.servers = hadoop102:9092,hadoop103:9092,hadoop104:9092
a1.sources.r2.kafka.topics=topic_event

## channel1
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /opt/module/flume/checkpoint/behavior1
a1.channels.c1.dataDirs = /opt/module/flume/data/behavior1/
a1.channels.c1.maxFileSize = 2146435071
a1.channels.c1.capacity = 1000000
a1.channels.c1.keep-alive = 6

## channel2
a1.channels.c2.type = file
a1.channels.c2.checkpointDir = /opt/module/flume/checkpoint/behavior2
a1.channels.c2.dataDirs = /opt/module/flume/data/behavior2/
a1.channels.c2.maxFileSize = 2146435071
a1.channels.c2.capacity = 1000000
a1.channels.c2.keep-alive = 6

## sink1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /origin_data/gmall/log/topic_start/%Y-%m-%d
a1.sinks.k1.hdfs.filePrefix = logstart-

##sink2
a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path = /origin_data/gmall/log/topic_event/%Y-%m-%d
a1.sinks.k2.hdfs.filePrefix = logevent-

## 不要产生大量小文件
a1.sinks.k1.hdfs.rollInterval = 3600
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.rollSize = 134217728
a1.sinks.k2.hdfs.rollCount = 0

## 控制输出文件是原生文件
a1.sinks.k1.hdfs.fileType = CompressedStream
a1.sinks.k2.hdfs.fileType = CompressedStream
a1.sinks.k1.hdfs.codeC = snappy
a1.sinks.k2.hdfs.codeC = snappy

## 拼装
a1.sources.r1.channels = c1
a1.sinks.k1.channel= c1
a1.sources.r2.channels = c2
a1.sinks.k2.channel= c2

其实在以上的a1.sinks.k1.hdfs.codeC = snappy这一配置处,之前配置的是lzop,但是在查看Flume启动日志时发现,我目前安装的hadoop还不支持lzop这种压缩方式,所以sink貌似是没创建成功,被移除了,所以后面又改成了snappy。

20/07/31 18:30:45 ERROR node.AbstractConfigurationProvider: Sink k1 has been removed due to an error during configuration
java.lang.IllegalArgumentException: Unsupported compression codec lzop. Please choose from: [None, BZip2Codec, DefaultCodec, DeflateCodec, GzipCodec, Lz4Codec, SnappyCodec]

在Flume启动后,jps确实有了Application的进程。

image-20200731115329636

查看Flume启动日志,也没有什么报错信息。但是诡异的事情发生了,Flume进程无法正常关闭,也就是使用kill 进程号,关闭不了,而使用kill -9 进程号却可以。

这二者的区别我也知道,kill -9是直接把指令发送给系统,让系统对进程强制关闭,但是这种方式可能会导致一些文件和数据的丢失,能不能尽量不用。kill是把指令发送给进程本身,让进程主动关闭。

我这里使用kill关闭不了Flume进程,能想到的就是进程接受不到指令,所以无法做出相应的关闭操作。

至于说Flume进程究竟发生了什么问题,还不得而知,有待今后的学习过程中寻找解决方案吧。

------ 本文结束感谢您的阅读 ------
坚持原创技术分享,您的支持将鼓励我继续创作!