我编写了一个 shell 脚本,它必须执行以下操作:
- 将会话命令捕获到一个文件中。
- 每个单独的命令都放入单独的文件中。
- 根据特定标准邮寄每个单独的命令文件内容。
据我观察,循环必须至少迭代 25,000 次。现在我的问题是完成所有迭代需要 6 个多小时。
以下是脚本的主要部分,需要很长时间才能处理。
if [ -s "$LOC/check.txt" ]; then
while read line; do
echo -e " started processing $line at `date` " >> "$SCRIPT_LOC/running_status.txt"
TST=`grep -w $line $PERM_LOC/id_processing.txt`
USER=`echo $TST | grep -w $line | awk -F '"' '{print $10}'`
HOST=`echo $TST | grep -w $line | awk -F '"' '{print $18}'`
ID=`echo $TST | echo $line | tr -d '\"'`
IP=`echo $TST | grep -w $line | awk -F '"' '{print $20}'`
DB=`echo $TST | grep -w $line | awk -F '"' '{print $22}'`
CONN_TSMP=`echo $TST | grep -w $line | awk -F '"' '{print $2}'`
if [ -z "$IP" ]; then
IP=`echo "$HOST"`
fi
if [ "$USER" == "root" ] && [ -z $DB ]; then
TARGET=/data1/sessions/root_sec
CMD_TARGET=/data1/commands/root_commands
FILE=`echo "$ID-$CONN_TSMP-$USER@$IP.txt"`
else
TARGET=/data1/sessions/user_sec
CMD_TARGET=/data1/commands/user_commands
FILE=`echo "$ID-$CONN_TSMP-$USER@$IP.txt"`
fi
ls $TARGET/$FILE
If [ $? -ne 0 ]; then
echo $TST | awk -F 'STATUS="0"' '{print $2}'| sed "s/[</>]//g" >> "$TARGET/$FILE"
echo -e "\n" >> "$TARGET/$FILE"
fi
grep $line $LOC/out.txt > "$LOC/temp.txt"
while read val; do
TSMP=`echo "$val" | awk -F '"' '{print $2}'`
QUERY=`echo "$val" | awk -F 'SQLTEXT=' '{print $2}' | sed "s/[/]//g"`
echo " TIMESTAMP=$TSMP " >> "$TARGET/$FILE"
echo " QUERY=$QUERY " >> "$TARGET/$FILE"
RES=`echo "$QUERY" | awk {'print $1'} | sed 's/["]//g' `
TEXT=`grep "$RES" "$PERM_LOC/commands.txt"`
if [ -n "$TEXT" ]; then
NUM=`expr $NUM + 1`
SUB_FILE=`echo "$ID-$command-$NUM-$TSMP-$USER@$IP.txt"`
echo -e "===============\n" > "$CMD_TARGET/$SUB_FILE"
echo "FILE = \"$SUB_FILE\"" >> "$CMD_TARGET/$SUB_FILE"
### same way append 6 more lines to $SUB_FILE
SUB=`echo "$WARN_ME" | grep "$command"`
if [ "$command" == "$VC" ]; then
STATE=`echo " very critical "`
elif [ -z "$SUB" ]; then
STATE=CRITICAL
else
STATE=WARNING
fi
if [ "$USER" != "root" -a "$command" != "$VC" ]; then
mail command &
elif [ "$USER" == "root" -a -z "$HOST" ]; then
mail command &
elif [ "$USER" == "root" -a "$command" == "$VC" ]; then
mail command &
else
echo -e "some message \n" >> $LOC/operations.txt
fi
fi
done < "$LOC/temp.txt"
done < "$LOC/check.txt"
fi
任何人都可以帮助我如何通过划分或更改逻辑或通过使用函数或其他任何方式来优化这段代码吗?
在这里,我必须仅使用 shell 脚本,并且执行脚本的服务器不应占用超过 3GB 的 RAM 来处理它。
任何帮助都非常非常有用。
答案1
天啊!
我明白为什么它需要永远运行,你在重复操作,而不是缓存信息,几乎把计算机打死了。可怜的电脑。 :(
awk 不是轻量级的,您要对相同的数据调用它很多很多次。我能够运行一次并设置所有五个变量。
如果不知道这应该做什么或完成什么,那么可以做的事情就太多了。
考虑到所有处理都是 grep、awk、sed 和 tr,通过用 PERL 编写此脚本可以获得令人印象深刻的速度提升。 PERL 被设计用来处理文本和报告。它可以在内部执行所有这些 grep/awk/sed/tr 操作,而无需重复地调用另一个程序。
但这里有一些改进:
if [ -s "$LOC/check.txt" ]; then
function setvars() {
CONN_TSMP="$1"
USER="$2"
HOST="$3"
DB="$4"
IP="$5"
return
}
while read line; do
echo " started processing ${line} at $(date) " >> "${SCRIPT_LOC}/running_status.txt"
ID=$(echo "$line" | tr -d '"')
# are you sure you don't want the FIRST match? This will give ALL the matches,
# which will prevent you from getting good values for the variables
# to only get first entry that matches:
# TST=$(grep --max-count=1 -w "$line" "$PERM_LOC/id_processing.txt")
# (or -m 1, but long options document what you're doing better)
TST=$(grep -w "$line" "$PERM_LOC/id_processing.txt")
VARS=$(echo "${TST}" | awk -F '"' '{print "\""$2"\" \""$10"\" \""$18"\" \""$20"\" \""$22'})
# CONN_TSMP USER HOST IP DB
# magic! setvars receives the 5 values awk pulled out (ran it once!)
# NO QUOTES on next line, already has them embedded from awk
setvars $VARS
if [ -z "$IP" ]; then
IP="$HOST"
fi
CMD_TARGET="/data1/commands/user_commands"
FILE="${ID}-${CONN_TSMP}-${USER}@${IP}.txt"
if [ "$USER" == "root" ] && [ -z "$DB" ]; then
TARGET="/data1/sessions/root_sec"
else
TARGET="/data1/sessions/user_sec"
fi
# does this need to be redirected to a file?
ls "$TARGET/$FILE"
if [ $? -ne 0 ]; then
# awk can likely do the print and the removal of </> characters in
# one pass (my awk-fu is weak this morning)
echo "$TST" | awk -F 'STATUS="0"' '{print $2}'| sed "s/[</>]//g" >> "$TARGET/$FILE"
echo -e "\n" >> "$TARGET/$FILE"
fi
# ALWAYS quote your values, embedded spaces will bite you!
grep "$line" "$LOC/out.txt" > "$LOC/temp.txt"
while read val; do
TSMP=$(echo "$val" | awk -F '"' '{print $2}')
QUERY=$(echo "$val" | awk -F 'SQLTEXT=' '{print $2}' | sed "s/[\"/]//g")
echo " TIMESTAMP=$TSMP " >> "$TARGET/$FILE"
echo " QUERY=$QUERY " >> "$TARGET/$FILE"
TEXT=$(grep "$QUERY" "$PERM_LOC/commands.txt")
if [ -n "$TEXT" ]; then
NUM=$(expr $NUM + 1)
# could also be: NUM=$(($NUM+1)) (bash v4.0+)
SUB_FILE="$ID-$command-$NUM-$TSMP-$USER@$IP.txt"
echo -e "===============\n" > "$CMD_TARGET/$SUB_FILE"
echo "FILE = \"$SUB_FILE\"" >> "$CMD_TARGET/$SUB_FILE"
### same way append 6 more lines to $SUB_FILE
SUB=$(echo "$WARN_ME" | grep "$command")
if [ "$command" == "$VC" ]; then
STATE=" very critical "
elif [ -z "$SUB" ]; then
STATE=" CRITICAL "
else
STATE=" WARNING "
fi
if [ "$USER" != "root" -a "$command" != "$VC" ]; then
# this should probably be $command instead of command?
# oh wait, probably a placeholder statement
mail command &
elif [ "$USER" == "root" -a -z "$HOST" ]; then
mail command &
elif [ "$USER" == "root" -a "$command" == "$VC" ]; then
mail command &
else
echo -e "some message \n" >> $LOC/operations.txt
fi
fi
done < "$LOC/temp.txt"
done < "$LOC/check.txt"
fi
嗯,“仅限 shell 脚本”。好吧,考虑到这一点,也许您可以预先 grep“$LOC/check.txt”和/或“$LOC/temp.txt”,以便您可以使用“已 grep”输出,而不是在循环中进行 grep。
我看得越多,我就越确信 awk 可以在一次遍历数据的情况下完成所有这些工作...并处理每个条目,而不仅仅是第一个条目(正如我在评论中指出的那样,你确实需要在“read line”和“read var”循环之间进行另一个循环。)
这将是一个很长的 awk 脚本,但绝对可行。而且 awk 值得了解,花点时间玩一下它,它没那么难,只是不同而已。哎呀哎呀!
答案2
您发布的代码无法运行并且缺少许多重要信息。您的附加信息实际上并不能说明您的输入和所需的输出到底是什么。
尽管如此,以下是我从脚本中删除所有 sed 和 awk 调用并显着简化脚本的看法,以便能够正确调试性能问题:
#!/usr/bin/env bash
# Should work using bash 3.2+ and the unrevealed part of your code
if [ ! -s "$LOC/check.txt" ]; then
echo "Bummer!"
exit 1
fi
function write_ts () {
echo "[$(date)]: Started processing ${line}" >> ${SCRIPT_LOC}/running_status.txt
}
function set_and_init_file_targets () {
if [ "$USER" == "root" ] && [ -z $DB ]; then
TARGET=/data1/sessions/root_sec
CMD_TARGET=/data1/commands/root_commands
else
TARGET=/data1/sessions/user_sec
CMD_TARGET=/data1/commands/user_commands
fi
FILE="${CONNECTION_ID}-${TIMESTAMP}-${USER}@${IP}.txt"
if [ ! -e "$TARGET/$FILE" ]; then
echo "${_res##*STATUS=0}" > "$TARGET/$FILE"
fi
}
function parse_line () {
local line=$@
while read val; do
res2=${val//[<>\(\)]/}
eval ${res2//AUDIT_RECORD/}
SQLTEXT=${SQLTEXT/%?/}
echo "TIMESTAMP=$TIMESTAMP" >> "$TARGET/$FILE"
echo "QUERY=$SQLTEXT" >> "$TARGET/$FILE"
/* grep the sql command by itself */
TEXT=$(grep -i "${SQLTEXT%% *}" "$PERM_LOC/commands.txt")
if [ -n "$TEXT" ]; then
NUM=$((NUM + 1))
SUB_FILE="$CONNECTION_ID-$command-$NUM-$TIMESTAMP-$USER@$IP.txt"
echo -e "===============\n" > "$CMD_TARGET/$SUB_FILE"
echo "FILE = \"$SUB_FILE\"" >> "$CMD_TARGET/$SUB_FILE"
# [... the rest does not make sense at all ...]
fi
done < <(grep "$line" "$LOC/out.txt")
}
# Main code
while read line; do
# grep line without quotes
TST=$(grep -w "${line//\"/}" "$PERM_LOC/id_processing.txt")
# remove everything besides key=val pairs
res=${TST//[<>\(\)]/}
# set the key=val pairs, except AUDIT_RECORD
eval ${res//AUDIT_RECORD/}
# set IP to HOST if empty
: ${IP:="$HOST"}
# remove nasty / at the end
DB=${DB/%?/}
set_and_init_file_targets
parse_line "$line"
done < "$LOC/check.txt"
发布此内容后,我实际上不相信您的性能问题仅源于这两个调用 awk/sed/grep 的 for 循环。${SCRIPT_LOC}/running_status.txt
当你的脚本运行一个小时左右时,你能输出你的前十行吗?
请注意,我的脚本片段完全未经测试,可能无法按照您的预期工作。但是,我尝试遵循您最初的脚本摘录的语义。