Bash 脚本：每天解析文本文件文件夹而不重复输出

Question 1

你问

每天处理文件夹中的文件而不导致日志文件中重复输出并且不移动/更改已处理的文件的最有效方法是什么？

实现此目的的一种方法是存储已处理的文件的排序列表。使用comm候选文件列表进行处理将删除重复项。

作为该技术的说明，类似这样的内容可以用作处理格式正确的文件名（例如由 HylaFax 生成的文件名）的基础：

find * -print | sort > /tmp/current_files

test -f /tmp/previous_files || { echo "Come back tomorrow"; exit 0; }
comm -13 /tmp/previous_files /tmp/current_files > /tmp/new_files

# ... Process entries in /tmp/new_files ...

mv -f /tmp/current_files /tmp/previous_files

Answer

你问

每天处理文件夹中的文件而不导致日志文件中重复输出并且不移动/更改已处理的文件的最有效方法是什么？

实现此目的的一种方法是存储已处理的文件的排序列表。使用comm候选文件列表进行处理将删除重复项。

作为该技术的说明，类似这样的内容可以用作处理格式正确的文件名（例如由 HylaFax 生成的文件名）的基础：

find * -print | sort > /tmp/current_files

test -f /tmp/previous_files || { echo "Come back tomorrow"; exit 0; }
comm -13 /tmp/previous_files /tmp/current_files > /tmp/new_files

# ... Process entries in /tmp/new_files ...

mv -f /tmp/current_files /tmp/previous_files

Question 2

这可能是一个不太优雅的解决方案，但这就是我想出的解决方法：

#!/bin/bash

processed=$(cat /scripts/processed_log.txt)

for i in /var/spool/hylafax/doneq/q*
  do
   case "${processed[@]}" in
       *"$i"*) ;;
       *) echo "New! Going to add $i to the log"
          echo $i >> /scripts/processed_log.txt
          user=$(cat $i | grep "mailaddr" | sed 's/mailaddr://g')
          pgs=$(cat $i | grep "npages" | sed 's/npages://g')
          echo "$i $user - $pgs pages" >> /scripts/log_output.txt
       ;;
esac
done

我只需要添加逻辑来删除 process_log.txt 中不再存在于假脱机目录中的文件。 sed 应该可以很好地解决这个问题。

Answer

这可能是一个不太优雅的解决方案，但这就是我想出的解决方法：

#!/bin/bash

processed=$(cat /scripts/processed_log.txt)

for i in /var/spool/hylafax/doneq/q*
  do
   case "${processed[@]}" in
       *"$i"*) ;;
       *) echo "New! Going to add $i to the log"
          echo $i >> /scripts/processed_log.txt
          user=$(cat $i | grep "mailaddr" | sed 's/mailaddr://g')
          pgs=$(cat $i | grep "npages" | sed 's/npages://g')
          echo "$i $user - $pgs pages" >> /scripts/log_output.txt
       ;;
esac
done

我只需要添加逻辑来删除 process_log.txt 中不再存在于假脱机目录中的文件。 sed 应该可以很好地解决这个问题。

Bash 脚本：每天解析文本文件文件夹而不重复输出

答案1

答案2

相关内容