使用文件 B 中的术语搜索文件 A，并根据文件 B 中的搜索术语将输出保存到各个 TXT 文件

Question 1

Grep + Xargs

xargs -d '\n' sh -c '
    for term; do grep "$term" fileA > "$term.txt"; done
' xargs-sh < fileB

^{改进为CAS。}

格雷普 + 外壳

一般来说使用 shell 循环读取文件是不好的做法，但这里fileB比它小得多fileA，因此不会显着影响性能。

while IFS= read -r term; do
    grep "$term" fileA > "$term.txt"
done < fileB

awk

awk 'NR==FNR{pat[$0];next}{for(term in pat){if($0~term){print>term}}}' fileB fileA

NR==FNR{pat[$0];next}读取作为参数给出的第一个文件并将每一行放入数组中pat。
{for(term in pat){if($0~term){print>term}}}是不言自明的：对于term数组中的每个元素，测试当前行是否与该术语匹配，如果是，则将其打印到相应命名的文件中。

并非所有 Awks 都允许同时打开多个文件。解决这个问题的一种方法是建议的埃德·莫顿，就是使用close语句并使用追加运算符：

awk 'NR==FNR{pat[$0];next}{for(term in pat){if($0~term){print>>term;close(term)}}}' fileB fileA

Answer

Grep + Xargs

xargs -d '\n' sh -c '
    for term; do grep "$term" fileA > "$term.txt"; done
' xargs-sh < fileB

^{改进为CAS。}

格雷普 + 外壳

一般来说使用 shell 循环读取文件是不好的做法，但这里fileB比它小得多fileA，因此不会显着影响性能。

while IFS= read -r term; do
    grep "$term" fileA > "$term.txt"
done < fileB

awk

awk 'NR==FNR{pat[$0];next}{for(term in pat){if($0~term){print>term}}}' fileB fileA

NR==FNR{pat[$0];next}读取作为参数给出的第一个文件并将每一行放入数组中pat。
{for(term in pat){if($0~term){print>term}}}是不言自明的：对于term数组中的每个元素，测试当前行是否与该术语匹配，如果是，则将其打印到相应命名的文件中。

并非所有 Awks 都允许同时打开多个文件。解决这个问题的一种方法是建议的埃德·莫顿，就是使用close语句并使用追加运算符：

awk 'NR==FNR{pat[$0];next}{for(term in pat){if($0~term){print>>term;close(term)}}}' fileB fileA

Question 2

这应该是有效的，因为它使用第一遍grep -F（非常快）来输出 FILE-A 中与 FILE-B 中的行匹配的行（这可能比 FILE 中原始的 100,000 行要少得多） -A) 所以 awk 脚本没有 FILE-A 中那么多行进行比较，并且可以在读取 FILE-B 时循环遍历这些行，而不是在读取 FILE-A 时循环遍历 FILE-B 行，因此能够只为 FILE-B 的每一行打开/关闭 1 个输出文件，而不是为 FILE-B 的每一行一次打开/关闭 1 个输出文件 * FILE-A 中与其匹配的每一行，以避免潜在的“打开太多”文件”错误。

$ cat tst.sh
#!/usr/bin/env bash

grep -F -f 'FILE-B' 'FILE-A' |
awk '
    NR==FNR{ lines[++numLines]=$0; next }
    {
        close(out)
        out = $0 ".txt"
        for (i=1; i<=numLines; i++) {
            line = lines[i]
            if (index(line,$0)) {
                print line > out
            }
        }
    }
' - 'FILE-B'

$ ./tst.sh

$ head -100 *.txt
==> 1.txt <==
123
1239870
41967849
910

==> 2.txt <==
123
1239870
2349878
39742366876
2378
6723

==> 23.txt <==
123
1239870
2349878
39742366876
2378
6723

==> 78.txt <==
45678
2349878
41967849
789
2378

Answer

这应该是有效的，因为它使用第一遍grep -F（非常快）来输出 FILE-A 中与 FILE-B 中的行匹配的行（这可能比 FILE 中原始的 100,000 行要少得多） -A) 所以 awk 脚本没有 FILE-A 中那么多行进行比较，并且可以在读取 FILE-B 时循环遍历这些行，而不是在读取 FILE-A 时循环遍历 FILE-B 行，因此能够只为 FILE-B 的每一行打开/关闭 1 个输出文件，而不是为 FILE-B 的每一行一次打开/关闭 1 个输出文件 * FILE-A 中与其匹配的每一行，以避免潜在的“打开太多”文件”错误。

$ cat tst.sh
#!/usr/bin/env bash

grep -F -f 'FILE-B' 'FILE-A' |
awk '
    NR==FNR{ lines[++numLines]=$0; next }
    {
        close(out)
        out = $0 ".txt"
        for (i=1; i<=numLines; i++) {
            line = lines[i]
            if (index(line,$0)) {
                print line > out
            }
        }
    }
' - 'FILE-B'

$ ./tst.sh

$ head -100 *.txt
==> 1.txt <==
123
1239870
41967849
910

==> 2.txt <==
123
1239870
2349878
39742366876
2378
6723

==> 23.txt <==
123
1239870
2349878
39742366876
2378
6723

==> 78.txt <==
45678
2349878
41967849
789
2378

使用文件 B 中的术语搜索文件 A，并根据文件 B 中的搜索术语将输出保存到各个 TXT 文件

答案1

Grep + Xargs

格雷普 + 外壳

awk

答案2

相关内容