AWK - 关于列的问题

Question 1

这可以一次性遍历文件，并且不需要将整个文件存储在内存中。它确实为每个目标文件保留打开的文件描述符。

awk -F '\t' '
    NR==FNR {population[$1]=$2; next}
    FNR==1 {
        for (i=1; i<=NF; i++) {
            destination[i] = population[$i] ".txt"
        }
    }
    {
        delete separator
        for (i=1; i<=NF; i++) {
            printf "%s%s", separator[destination[i]], $i > destination[i]
            separator[destination[i]] = FS
        }
        for (file in separator) {
            printf "\n" > file
        }
    }
' populations.txt database.txt

Answer

这可以一次性遍历文件，并且不需要将整个文件存储在内存中。它确实为每个目标文件保留打开的文件描述符。

awk -F '\t' '
    NR==FNR {population[$1]=$2; next}
    FNR==1 {
        for (i=1; i<=NF; i++) {
            destination[i] = population[$i] ".txt"
        }
    }
    {
        delete separator
        for (i=1; i<=NF; i++) {
            printf "%s%s", separator[destination[i]], $i > destination[i]
            separator[destination[i]] = FS
        }
        for (file in separator) {
            printf "\n" > file
        }
    }
' populations.txt database.txt

Question 2

我相信这不是最好的方法，因为我们需要读取database.txt的次数与我们拥有的区域加一一样多。不幸的是，我没有想到另一种方式。

转置数据库.txt：

awk '{for(i=1;i<=NF;i++){a[NR,i]=$i}}NF>p{p=NF}END{for(j=1;j<=p;j++ ){str=a[1,j];for(i=2;i<=NR;i++){str=str" "a[i,j];}print str}}' 数据库.txt > 数据库.tmp

更具可读性（相同的命令）：

awk '
{ 
    for (i=1; i<=NF; i++)  {
        a[NR,i] = $i
    }
}
NF>p { p = NF }
END {    
    for(j=1; j<=p; j++) {
        str=a[1,j]
        for(i=2; i<=NR; i++){
            str=str" "a[i,j];
        }
        print str
    }
}' database.txt > database.tmp

2.读取带有ids的文件并从转置的database.tmp中grep所有id：

while read id region ; do grep -m 1 $id database.tmp >> $region.txt.tmp ; done < population.txt

3.将所有region.txt.tmp文件转置为您需要的形式：

for region_file in *txt.tmp ; do awk '{for(i=1;i<=NF;i++){a[NR,i]=$i}}NF>p{p=NF}END{for(j=1;j<=p;j++){str=a[1,j];for(i=2;i<=NR;i++){str=str" "a[i,j];}print str}}' $region_file > ${region_file%.tmp} ; done

4.删除所有临时文件

Answer