仅保留第一个文件中的列号与第二个文件中的列号匹配的列

Question 1

假设genotype.file文件是制表符分隔的：

cut -f $(tr '\n' ',' <count.file.1 | sed 's/,$//') genotype.file

命令替换生成要从输入文件中剪切的$( tr ... | sed ... )以逗号分隔的列号列表。cut

将文件tr中的所有换行符替换为逗号count.file.1，同时sed删除末尾多余的逗号。

根据示例数据，生成的命令将如下所示：

cut -f 51,92,166,169,196,199,213,228,229,284,291,297 genotype.file

循环遍历你的count.file.*文件：

for cfile in count.file.*; do
    cut -f $(tr '\n' ',' <"$cfile" | sed 's/,$//') genotype.file >genotype-"${cfile##*.}"
done

这将创建一个名为的新文件，genotype-N其中是与用于生成它的N相对应的数字。该数字是从文件名末尾提取的。count.file.Ngenotype.file

如果genotype.file是不是制表符分隔，您可以将其设置为制表符分隔：

tr -s ' ' '\t' <genotype.file >genotype.tsv

这假设原始文件中的列仅由空格分隔。该tr命令将用制表符替换多个连续空格。结果被重定向到一个新文件。您可以cut在这个新文件上使用上面的命令。

使用awk

awk 'NR == FNR { c[++n] = $0; next } { t=$c[1]; for (i=2; i<=n; ++i) t = t OFS $c[i]; print t }' count.file.1 genotype.file

这将首先读取count.file.1我们想要从中提取到genotype.file数组中的列c，然后，当我们读取时genotype.file，这些列号用于提取数据。 t是一个临时变量，用于保存从所选列构造的输出行。

循环遍历你的count.file.*文件：

for cfile in count.file.*; do
    awk 'NR == FNR { c[++n] = $0; next } { t=$c[1]; for (i=2; i<=n; ++i) t = t OFS $c[i]; print t }' \
        "$cfile" genotype.file >genotype-"${cfile##*.}"
done

这将创建新文件，其genotype-N调用方式与cut解决方案相同。

Answer

假设genotype.file文件是制表符分隔的：

cut -f $(tr '\n' ',' <count.file.1 | sed 's/,$//') genotype.file

命令替换生成要从输入文件中剪切的$( tr ... | sed ... )以逗号分隔的列号列表。cut

将文件tr中的所有换行符替换为逗号count.file.1，同时sed删除末尾多余的逗号。

根据示例数据，生成的命令将如下所示：

cut -f 51,92,166,169,196,199,213,228,229,284,291,297 genotype.file

循环遍历你的count.file.*文件：

for cfile in count.file.*; do
    cut -f $(tr '\n' ',' <"$cfile" | sed 's/,$//') genotype.file >genotype-"${cfile##*.}"
done

这将创建一个名为的新文件，genotype-N其中是与用于生成它的N相对应的数字。该数字是从文件名末尾提取的。count.file.Ngenotype.file

如果genotype.file是不是制表符分隔，您可以将其设置为制表符分隔：

tr -s ' ' '\t' <genotype.file >genotype.tsv

这假设原始文件中的列仅由空格分隔。该tr命令将用制表符替换多个连续空格。结果被重定向到一个新文件。您可以cut在这个新文件上使用上面的命令。

使用awk

awk 'NR == FNR { c[++n] = $0; next } { t=$c[1]; for (i=2; i<=n; ++i) t = t OFS $c[i]; print t }' count.file.1 genotype.file

这将首先读取count.file.1我们想要从中提取到genotype.file数组中的列c，然后，当我们读取时genotype.file，这些列号用于提取数据。 t是一个临时变量，用于保存从所选列构造的输出行。

循环遍历你的count.file.*文件：

for cfile in count.file.*; do
    awk 'NR == FNR { c[++n] = $0; next } { t=$c[1]; for (i=2; i<=n; ++i) t = t OFS $c[i]; print t }' \
        "$cfile" genotype.file >genotype-"${cfile##*.}"
done

这将创建新文件，其genotype-N调用方式与cut解决方案相同。

Question 2

awk仅与简单脚本一起使用。

awk '{ printf "{ print ";for(i=1; i<NF; i++){ printf "$%d, ",$i};
       print "$"$i" }" }' <<< "$(awk '{printf $0" "}' count.file.{1..50})" >genotype.awk

这将生成一个如下awk所示的脚本，它将收集所有文件genotype.awk中的所有列号。count.file.{1..50}我们用了Brace Expansion在这里读取所有这 50 个文件awk。

{ print $51, $92, $166, $169, $196, $199, $213, $228, $229, $284, $291, $297, ... }

用法：

awk -f genotype.awk genotype.file

这将在文件上执行genotype.awk脚本genotype.file并仅打印包含的列号。

Answer

awk仅与简单脚本一起使用。

awk '{ printf "{ print ";for(i=1; i<NF; i++){ printf "$%d, ",$i};
       print "$"$i" }" }' <<< "$(awk '{printf $0" "}' count.file.{1..50})" >genotype.awk

这将生成一个如下awk所示的脚本，它将收集所有文件genotype.awk中的所有列号。count.file.{1..50}我们用了Brace Expansion在这里读取所有这 50 个文件awk。

{ print $51, $92, $166, $169, $196, $199, $213, $228, $229, $284, $291, $297, ... }

用法：

awk -f genotype.awk genotype.file

这将在文件上执行genotype.awk脚本genotype.file并仅打印包含的列号。

仅保留第一个文件中的列号与第二个文件中的列号匹配的列

答案1

答案2

相关内容