我有一个脚本,它将 hdfs 上每小时目录中存在的所有小文件合并为一个大文件。通过 CLI 执行时,该脚本运行正常。然后我将脚本设置为每天凌晨 01:30 运行以合并前一天的文件。但它不起作用。我在脚本顶部导出了 PATH、HADOOP_HOME、HADOOP_CONF_DIR。将权限从用户更改为 root。但没有帮助。这是我的脚本:
#!/bin/bash
export PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/java/jdk1.8.0/bin:/home/hadoopuser/hadoop/bin:/home/hadoopuser/zookeeper/bin:/home/hadoopuser/hive/bin:/home/hadoopuser/derby/bin:/home/hadoopuser/maven/bin:/home/hadoopuser/pig/bin:/home/hadoopuser/spark/bin:/home/hadoopuser/flume/bin:/home/hadoopuser/.local/bin:/home/hadoopuser/bin:/home/hadoopuser/user1/tmp
export HADOOP_HOME=/home/hadoopuser/hadoop
export HADOOP_CONF_DIR=/home/hadoopuser/hadoop/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=/home/hadoopuser/hadoop/lib/native
export HADOOP_OPTS="-Djava.library.path=/home/hadoopuser/hadoop/lib/native"
echo $HADOOP_HOME
echo $HADOOP_CONF_DIR
echo $PATH
mnth=$(date +%m)
day=$(date -d "1 day ago" +"%d")
echo "Running for $day-$mnth-2017"
for k in $mnth
do
for j in $day
do
for i in 17 18 19 20 21 22 23
do
hadoop fs -cat /topics/topic1/year=2017/month=$k/day=$j/hour=$i/* | hadoop fs -put - /merged/topic1/2017"_"$k"_"$j"_"$i
hadoop fs -du -s /merged/topic1/2017"_"$k"_"$j"_"$i > /home/hadoopuser/user1/merge_test/size.txt
x=`awk '{ print $1 }' /home/hadoopuser/user1/merge_test/size.txt`
if [ $x -eq 0 ]
then
hadoop fs -rm /merged/topic1/2017"_"$k"_"$j"_"$i
else
echo "MERGE DONE!!! All files generated at hour $i of $j-$k-2017 merged into one"
fi
done
done
done
rm -f /home/hadoopuser/user1/merge_test/size.txt
这是我提到的crontab -e
30 1 * * * /home/hadoopuser/user1/tmp/cron-merge-generalevents.sh > /home/hadoopuser/user1/tmp/cron-merge-generalevents.txt
我所看到的/home/hadoopuser/user1/tmp/cron-merge-generalevents.txt
只是一天中的所有时间
/home/hadoopuser/hadoop
/home/hadoopuser/hadoop/etc/hadoop
/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/java/jdk1.8.0/bin:/home/hadoopuser/hadoop/bin:/home/hadoopuser/zookeeper/bin:/home/hadoopuser/hive/bin:/home/hadoopuser/derby/bin:/home/hadoopuser/maven/bin:/home/hadoopuser/pig/bin:/home/hadoopuser/spark/bin:/home/hadoopuser/flume/bin:/home/hadoopuser/.local/bin:/home/hadoopuser/bin:/home/hadoopuser/user1/tmp
Running for 19-07-2017
MERGE DONE!!! All files generated at hour 17 of 19-07-2017 merged into one
MERGE DONE!!! All files generated at hour 18 of 19-07-2017 merged into one
MERGE DONE!!! All files generated at hour 19 of 19-07-2017 merged into one
MERGE DONE!!! All files generated at hour 20 of 19-07-2017 merged into one
MERGE DONE!!! All files generated at hour 21 of 19-07-2017 merged into one
MERGE DONE!!! All files generated at hour 22 of 19-07-2017 merged into one
MERGE DONE!!! All files generated at hour 23 of 19-07-2017 merged into one
答案1
在这种情况下,好的建议是导出,JAVA_HOME
因为您使用的 Hadoop 依赖于 java。但最好的方法是从以下位置导入/获取所有变量bash_profile
(剧本开头)
. /path/to/.bash_profile
或者
source /path/to/.bash_profile