更好更高效的 bash 脚本 (grep)

2024-6-3 • tag-icon

我有一个我构建的脚本..它运行得非常好 - 但它预计需要 4 天才能运行！我想知道是否有更有效的方法来做到这一点。

该脚本的作用如下：

它从图像服务器获取所有文件并将它们加载到图像服务器.txt
它格式化 grep 的文件路径
循环通过图像服务器.txt和/var/www/html每行grep
写入 2 个格式化（存在和不存在）文件以供以后使用
写入日志文件以tail跟踪脚本进度

我有 2 个文件。

图像服务器.txt （约25万行）

imageserver/icons/socialmedia/sqcolor_tumblr.png
imageserver/icons/socialmedia/sqcolor_gaf.png
imageserver/icons/socialmedia/sqcolor_yelp.png
imageserver/icons/socialmedia/sqcolor_linkedin.png
imageserver/icons/socialmedia/sqcolor_twitter.png
imageserver/icons/socialmedia/sqcolor_angies.png
imageserver/icons/socialmedia/sqcolor_houzz.png

搜索.sh

#!/bin/bash

echo "\n\n Started ...\n\n"

# Clear Runtime Files
> doesExist.txt
> nonExists.txt
> imgSearch.log

echo "\n\n Building Image List ...\n\n"

#write contents of imageserver to imageserver.txt
find /var/www/imageserver/ -type f > imageserver.txt

# Remove /var/www
find ./imageserver.txt -type f -readable -writable -exec sed -i "s/\/var\/www\///g" {} \;
echo "\n\n Finished Building Start Searching ...\n\n"

linecount=$(wc -l < ./imageserver.txt)

while IFS= read -r var
do
echo "$linecount\n\n"
echo "\n ... Searching $var\n "

results=$(grep -rl "$var" /var/www/html)
if [ $? -eq 0 ]; then
    echo "Image exists ...\n"
    echo "$var|||$results^^^" >> doesExist.txt
    echo "$linecount | YES | $var " >> imgSearch.log
else
    echo "Image does not exist ... \n"
    echo $var >> nonExists.txt
    echo "$linecount | NO | $var " >> imgSearch.log
fi

linecount=$((linecount-1))
done < ./imageserver.txt

echo "\n\n -- FINISHED -- \n\n"

基本上我正在检查图像是否与/var/www/html目录中的任何 html 一起使用。

话虽如此，每次迭代grep大约需要 0.5 - 1 秒。根据我的计算，这是 3 - 4 天.. 虽然我认为这是例外的.. 有没有更好（更有效）的方法来实现这一点？

答案1

脚本的性能不是你的问题。

你正在摸索每个在/var/www/html下查找250,000次！

您需要将 while 循环替换为：

grep -rl -F -f ./imageserver.txt /var/www/html > grep_output

然后您将解析该输出文件以获取统计信息。这会很棘手，但不会需要 4 天。

或者，也许很简单

grep -Ff ./imageserver.txt -o grep_output | sort -u

获取使用的图像列表。您可以将comm其与 imageserver.txt 进行比较来查找图像不是用过的。

答案1

相关内容