Bash - 过滤占据一定比例列的行

Question 1

会走多远

awk 'gsub(/-/, "&") < 2' file
ID       Ct       1          2          3          4           5             6
3        0        consensus  consensus  consensus  consensus   consensus     consensus
5        0        -          AT         AT         GC          GC            AT
8        0        consensus  consensus  consensus  -           consensus     consensus

我懂了？请注意，rg 没有说什么。所需的输出 - 您想要单个输出文件、以输出行为前缀的文件名，还是名称与原始文件相似的新文件，还是什么？

编辑（在对新文件名进行评论后）：

awk 'gsub(/-/, "&") < 2 {print > (FILENAME ".new")}' /path/to/file/*

Answer

会走多远

awk 'gsub(/-/, "&") < 2' file
ID       Ct       1          2          3          4           5             6
3        0        consensus  consensus  consensus  consensus   consensus     consensus
5        0        -          AT         AT         GC          GC            AT
8        0        consensus  consensus  consensus  -           consensus     consensus

我懂了？请注意，rg 没有说什么。所需的输出 - 您想要单个输出文件、以输出行为前缀的文件名，还是名称与原始文件相似的新文件，还是什么？

编辑（在对新文件名进行评论后）：

awk 'gsub(/-/, "&") < 2 {print > (FILENAME ".new")}' /path/to/file/*

Question 2

如果所有文件都位于同一目录中，则可以使用 for 循环/glob 循环每个文件并对其运行 awk 命令：

for file in /path/to/files/*; do
    awk '{
        count=0
        for (i=3;i<=8;i++) {
            if ($i == "-") {
                count++
            }
        }
        if ((count <= 1)) {
            print
        }
    }' "$file"
done

对于每一行，它将循环遍历第 3-8 列，如果该列的值等于-它与相加的值count，如果count一行的值大于 1，则不会打印。

Answer

如果所有文件都位于同一目录中，则可以使用 for 循环/glob 循环每个文件并对其运行 awk 命令：

for file in /path/to/files/*; do
    awk '{
        count=0
        for (i=3;i<=8;i++) {
            if ($i == "-") {
                count++
            }
        }
        if ((count <= 1)) {
            print
        }
    }' "$file"
done

对于每一行，它将循环遍历第 3-8 列，如果该列的值等于-它与相加的值count，如果count一行的值大于 1，则不会打印。

Question 3

Perl 对于这种事情很方便 - 特别是，它允许在grep没有显式循环的情况下按字段进行，其结果（当在标量上下文中评估时）给出匹配的计数。例如

$ perl -lane 'print if 3 > grep { $_ eq "-" } splice @F, 2' file
ID       Ct       1          2          3          4           5             6
3        0        consensus  consensus  consensus  consensus   consensus     consensus
5        0        -          AT         AT         GC          GC            AT
8        0        consensus  consensus  consensus  -           consensus     consensus

Answer

Perl 对于这种事情很方便 - 特别是，它允许在grep没有显式循环的情况下按字段进行，其结果（当在标量上下文中评估时）给出匹配的计数。例如

$ perl -lane 'print if 3 > grep { $_ eq "-" } splice @F, 2' file
ID       Ct       1          2          3          4           5             6
3        0        consensus  consensus  consensus  consensus   consensus     consensus
5        0        -          AT         AT         GC          GC            AT
8        0        consensus  consensus  consensus  -           consensus     consensus

Bash - 过滤占据一定比例列的行

答案1

答案2

答案3

相关内容