bash - 来自不同文件的所有可能的单词组合

bash - 来自不同文件的所有可能的单词组合

我有n每行一个单词的文件

文件1 文件2 文件3 ...
1_a 2_a 3_a
1_b 2_b 3_b
1_c 3_c

我想编写一个 bash 脚本,它获取所有这些文件并生成 n 个单词的所有可能组合(每个文件一个)。

在我的示例中,我想要这个结果:

1_a 2_a 3_a
1_a 2_a 3_b
1_a 2_a 3_c
1_a 2_b 3_a
1_a 2_b 3_b
1_a 2_b 3_c
1_b 2_a 3_a
1_b 2_a 3_b
1_b 2_a 3_c
1_b 2_b 3_a
1_b 2_b 3_b
1_b 2_b 3_c
1_c 2_a 3_a
1_c 2_a 3_b
1_c 2_a 3_c
1_c 2_b 3_a
1_c 2_b 3_b
1_c 2_b 3_c

我尝试用paste 和awk 来做到这一点,但失败了。我怎样才能做到这一点 ?

答案1

您可以使用递归函数在有文件需要处理时调用自身:

#!/bin/bash

process () {
    local prefix=$1
    local file=$2
    shift 2
    while read line ; do
        if (($#)) ; then                  # There are still unprocessed files.
            process "$prefix $line" "$@"
        else                              # Reading the last file.
            printf '%s\n' "$prefix $line"
        fi
    done < "$file"
}

process '' "$@"

答案2

parallel --line-buffer --keep-order echo :::: file1 :::: file2 :::: file3

https://www.gnu.org/software/parallel/parallel_tutorial.html#multiple-input-sources

答案3

我知道你说过bash,但这非常适合诸如python 3.3+

import sys
from contextlib import ExitStack
from itertools import product

with ExitStack() as stack:
  files = [stack.enter_context(open(f)) for f in sys.argv[1:]]
  for x in product(*files):
    x = [y.rstrip('\n') for y in x]
    print(*x)

将上面的代码放在一个名为的文件中combo.py并调用它,从而python combo.py file_1 file_2 file_3 生成

1_a 2_a 3_a
1_a 2_a 3_b
1_a 2_a 3_c
1_a 2_b 3_a
1_a 2_b 3_b
1_a 2_b 3_c
1_b 2_a 3_a
1_b 2_a 3_b
1_b 2_a 3_c
1_b 2_b 3_a
1_b 2_b 3_b
1_b 2_b 3_c
1_c 2_a 3_a
1_c 2_a 3_b
1_c 2_a 3_c
1_c 2_b 3_a
1_c 2_b 3_b
1_c 2_b 3_c

答案4

bash 中的大括号扩展为这项工作提供了合适的工具。考虑一个简单的情况,例如:

$ echo {1..3}{a..c}
1a 1b 1c 2a 2b 2c 3a 3b 3c

在你的例子中你会有这样的东西:

$ echo {1_a,1_b,1_c}{2_a,2_b}{3_a,3_b,3_c}
1_a2_a3_a 1_a2_a3_b 1_a2_a3_c 1_a2_b3_a 1_a2_b3_b 1_a2_b3_c 1_b2_a3_a 1_b2_a3_b 1_b2_a3_c 1_b2_b3_a 1_b2_b3_b 1_b2_b3_c 1_c2_a3_a 1_c2_a3_b 1_c2_a3_c 1_c2_b3_a 1_c2_b3_b 1_c2_b3_c

这是正确的,但很难阅读。为了更好地演示,您可以将生成的输出放入数组中,然后打印该数组:

$ combos=({1_a,1_b,1_c}{2_a,2_b}{3_a,3_b,3_c})
$ for i in "${combos[@]}"; do echo "$i"; done
1_a2_a3_a
1_a2_a3_b
1_a2_a3_c
1_a2_b3_a
1_a2_b3_b
1_a2_b3_c
1_b2_a3_a
1_b2_a3_b
1_b2_a3_c
1_b2_b3_a
1_b2_b3_b
1_b2_b3_c
1_c2_a3_a
1_c2_a3_b
1_c2_a3_c
1_c2_b3_a
1_c2_b3_b
1_c2_b3_c

有很多方法可以在每个组合元素之间添加间隙,使它们看起来像:

1_a 2_a 3_a
..
..

但这是另一个问题,您可以单独提出。

相关内容