递归搜索第一行包含特定字符串组合的文件

Question 1

find+awk解决方案：

find ./backup -type f -exec \
awk 'NR == 1{ if (/StockID.*SellPrice/) print FILENAME; exit }' {} \;

如果关键单词的顺序可能不同，请将模式替换/StockID.*SellPrice/为/StockID/ && /SellPrice/。

如果文件数量巨大，更有效的替代方案是（一次处理一堆文件；命令的调用总数将远小于匹配文件的数量）：

find ./backup -type f -exec \
awk 'FNR == 1 && /StockID.*SellPrice/{ print FILENAME }{ nextfile }' {} +

Answer

find+awk解决方案：

find ./backup -type f -exec \
awk 'NR == 1{ if (/StockID.*SellPrice/) print FILENAME; exit }' {} \;

如果关键单词的顺序可能不同，请将模式替换/StockID.*SellPrice/为/StockID/ && /SellPrice/。

如果文件数量巨大，更有效的替代方案是（一次处理一堆文件；命令的调用总数将远小于匹配文件的数量）：

find ./backup -type f -exec \
awk 'FNR == 1 && /StockID.*SellPrice/{ print FILENAME }{ nextfile }' {} +

Question 2

使用 GNUgrep或兼容：

grep -Hrnm1 '^' ./backup | sed -n '/StockID.*SellPrice/s/:1:.*//p'

递归 grep 将打印每个文件的第一行并打印filename:1:line 没有读取整个文件（该-m1标志应使其在第一个匹配时退出），并且sed将打印filename该line部分与模式匹配的位置。

这将失败并显示文件名字其中包含:1:自身或换行符，但这是一个值得冒的风险，而不是放置一些慢find+awk组合，为每个文件执行另一个进程。

Answer

使用 GNUgrep或兼容：

grep -Hrnm1 '^' ./backup | sed -n '/StockID.*SellPrice/s/:1:.*//p'

递归 grep 将打印每个文件的第一行并打印filename:1:line 没有读取整个文件（该-m1标志应使其在第一个匹配时退出），并且sed将打印filename该line部分与模式匹配的位置。

这将失败并显示文件名字其中包含:1:自身或换行符，但这是一个值得冒的风险，而不是放置一些慢find+awk组合，为每个文件执行另一个进程。

Question 3

为了避免每个文件运行一个命令并读取整个文件，使用 GNU awk：

(unset -v POSIXLY_CORRECT; exec find backup/ -type f -exec gawk '
  /\<StockID\>/ && /\<SellPrice\>/ {print FILENAME}; {nextfile}' {} +)

或者与zsh：

set -o rematchpcre # where we know for sure \b is supported
for file (backup/**/*(ND.)) {
  IFS= read -r line < $file &&
   [[ $line =~ "\bStockID\b" ]] &&
   [[ $line =~ "\bSellPrice\b" ]] &&
   print -r $file
}

或者：

set -o rematchpcre
print -rl backup/**/*(D.e:'
  IFS= read -r line < $REPLY &&
   [[ $line =~ "\bStockID\b" ]] &&
   [[ $line =~ "\bSellPrice\b" ]]':)

或者在本机扩展正则表达式支持,字边界运算符bash的系统上（在其他系统上，您也可以尝试/或）：\<\>[[:<:]][[:>:]]\b

RE1='\<StockId\>' RE2='\<SellPrice\>' find backup -type f -exec bash -c '
  for file do
    IFS= read -r line < "$file" &&
    [[ $line =~ $RE1 ]] &&
    [[ $line =~ $RE2 ]] &&
    printf "%s\n" "$file"
  done' bash {} +

Answer

为了避免每个文件运行一个命令并读取整个文件，使用 GNU awk：

(unset -v POSIXLY_CORRECT; exec find backup/ -type f -exec gawk '
  /\<StockID\>/ && /\<SellPrice\>/ {print FILENAME}; {nextfile}' {} +)

或者与zsh：

set -o rematchpcre # where we know for sure \b is supported
for file (backup/**/*(ND.)) {
  IFS= read -r line < $file &&
   [[ $line =~ "\bStockID\b" ]] &&
   [[ $line =~ "\bSellPrice\b" ]] &&
   print -r $file
}

或者：

set -o rematchpcre
print -rl backup/**/*(D.e:'
  IFS= read -r line < $REPLY &&
   [[ $line =~ "\bStockID\b" ]] &&
   [[ $line =~ "\bSellPrice\b" ]]':)

或者在本机扩展正则表达式支持,字边界运算符bash的系统上（在其他系统上，您也可以尝试/或）：\<\>[[:<:]][[:>:]]\b

RE1='\<StockId\>' RE2='\<SellPrice\>' find backup -type f -exec bash -c '
  for file do
    IFS= read -r line < "$file" &&
    [[ $line =~ $RE1 ]] &&
    [[ $line =~ $RE2 ]] &&
    printf "%s\n" "$file"
  done' bash {} +

Question 4

egrep+ awk:

 egrep -Hrn 'StockID|SellPrice' ./backup | awk -F ':' '$2==1{print $1}'

Answer

egrep+ awk:

 egrep -Hrn 'StockID|SellPrice' ./backup | awk -F ':' '$2==1{print $1}'

递归搜索第一行包含特定字符串组合的文件

答案1

答案2

答案3

答案4

相关内容