DataNode OOM溢出

DataNode的内存溢出报错

2017-12-17 23:58:14,422 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1437036909-![img](file:///C:\Users\Administrator\AppData\Roaming\Tencent\QQTempSys\%W@GJ$ACOF(TYDYECOKVDYB.png)192.168.17.36-1509097205664:blk_1074725940_987917, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2017-12-17 23:58:31,425 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. Will retry in 30 seconds.
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154)
at java.lang.Thread.run(Thread.java:745)
2017-12-17 23:59:01,426 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. Will retry in 30 seconds.
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154)
at java.lang.Thread.run(Thread.java:745)
2017-12-17 23:59:05,520 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. Will retry in 30 seconds.
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:154)
at java.lang.Thread.run(Thread.java:745)
2017-12-17 23:59:31,429 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1437036909-![img](file:///C:\Users\Administrator\AppData\Roaming\Tencent\QQTempSys\%W@GJ$ACOF(TYDYECOKVDYB.png)192.168.17.36-1509097205664:blk_1074725951_987928 src: /![img](file:///C:\Users\Administrator\AppData\Roaming\Tencent\QQTempSys\%W@GJ$ACOF(TYDYECOKVDYB.png)192.168.17.54:40478 dest: /![img](file:///C:\Users\Administrator\AppData\Roaming\Tencent\QQTempSys\%W@GJ$ACOF(TYDYECOKVDYB.png)192.168.17.48:50010

CDH查看DataNode的内存情况

datanode内存使用图

明明1G的内存都没有使用,为什么会报OOM?

可以确定是操作系统哪里设置错了,我想应该是把产品环境的某个参数配置错了,系统本身的影响肯定不会有了,因为产品环境上我们只create了800左右个线程,就OOM了,那应该就是配置的问题了

解决方法:

1.

echo "kernel.threads-max=196605" >> /etc/sysctl.conf
echo "kernel.pid_max=196605" >> /etc/sysctl.conf
echo "vm.max_map_count=393210" >> /etc/sysctl.conf
sysctl -p

2.

/etc/security/limits.conf
* soft nofile 196605
* hard nofile 196605
* soft nproc 196605
* hard nproc 196605
Author: Tunan
Link: http://yerias.github.io/2018/10/09/PE/1/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.