按第一列合并多个文件

Question 1

使用join：

join -a1 -a2 -e 1 -o auto <(join -a1 -a2 -e 1 -o auto file1 file2) file3

见于男人加入

-a 文件号
还打印文件 FILENUM 中不可配对的行，其中 FILENUM 为 1 或 2，对应于 FILE1 或 FILE2

-e 空
将缺失的输入字段替换为 EMPTY

-o 格式
构建输出行时遵循 FORMAT

如果 FORMAT 是关键字“auto”，则每个文件的第一行决定每行输出的字段数。

注意：连接需要排序的输入，因此如果这些输入未排序（它们在给定的样本中），请首先对它们进行排序，例如：

join -a1 -a2 -e 1 -o auto \
    <(join -a1 -a2 -e 1 -o auto <(sort file1) <(sort file2)) \
    <(sort file3)

要将其应用于多个文件：

连接前两个文件并将输出保存到第三个文件join.tmp:
```
join -a1 -a2 -e 1 -o auto file1 file2 >join.tmp
```

next 循环遍历其余文件并更新join.tmp每次运行的文件：

for file in rest_files*; do
    join -a1 -a2 -e 1 -o auto join.tmp "$file" >join.tmp.1
    mv join.tmp.1 join.tmp
done

最后你的join.tmp将是您的最终连接结果。

带标题打印：

$ hdr() { awk 'FNR==1{ print "\0", FILENAME }1' "$1"; }
$ join -a1 -a2 -e 1 -o auto \
      <(join -a1 -a2 -e 1 -o auto <( hdr file1) <(hdr file2)) \
      <(hdr file3) |tr -d '\0'

对于多文件版本：

$ hdr() { awk 'FNR==1{ print "\0", FILENAME }1' "$1"; }
$ join -a1 -a2 -e 1 -o auto <(hdr file1) <(hdr file2) >join.tmp
$ for file in rest_files*; do
     join -a1 -a2 -e 1 -o auto join.tmp <(hdr "$file") >join.tmp.1
     mv join.tmp.1 join.tmp
  done
$ tr -d '\0' <join.tmp >final.file

Answer

使用join：

join -a1 -a2 -e 1 -o auto <(join -a1 -a2 -e 1 -o auto file1 file2) file3

见于男人加入

-a 文件号
还打印文件 FILENUM 中不可配对的行，其中 FILENUM 为 1 或 2，对应于 FILE1 或 FILE2

-e 空
将缺失的输入字段替换为 EMPTY

-o 格式
构建输出行时遵循 FORMAT

如果 FORMAT 是关键字“auto”，则每个文件的第一行决定每行输出的字段数。

注意：连接需要排序的输入，因此如果这些输入未排序（它们在给定的样本中），请首先对它们进行排序，例如：

join -a1 -a2 -e 1 -o auto \
    <(join -a1 -a2 -e 1 -o auto <(sort file1) <(sort file2)) \
    <(sort file3)

要将其应用于多个文件：

连接前两个文件并将输出保存到第三个文件join.tmp:
```
join -a1 -a2 -e 1 -o auto file1 file2 >join.tmp
```

next 循环遍历其余文件并更新join.tmp每次运行的文件：

for file in rest_files*; do
    join -a1 -a2 -e 1 -o auto join.tmp "$file" >join.tmp.1
    mv join.tmp.1 join.tmp
done

最后你的join.tmp将是您的最终连接结果。

带标题打印：

$ hdr() { awk 'FNR==1{ print "\0", FILENAME }1' "$1"; }
$ join -a1 -a2 -e 1 -o auto \
      <(join -a1 -a2 -e 1 -o auto <( hdr file1) <(hdr file2)) \
      <(hdr file3) |tr -d '\0'

对于多文件版本：

$ hdr() { awk 'FNR==1{ print "\0", FILENAME }1' "$1"; }
$ join -a1 -a2 -e 1 -o auto <(hdr file1) <(hdr file2) >join.tmp
$ for file in rest_files*; do
     join -a1 -a2 -e 1 -o auto join.tmp <(hdr "$file") >join.tmp.1
     mv join.tmp.1 join.tmp
  done
$ tr -d '\0' <join.tmp >final.file

Question 2

有点笨拙，但是这个awk代码可以工作。它使用的选项伪多维数组与数组索引的 SUBSEP 串联。将所有数据保存在 RAM 中，因此在这种情况下受到限制。

{x[$1]=$1 ; file[FILENAME]=FILENAME ; y[$1,FILENAME]=$2}

END { for (i in file) { printf "\t%s",file[i] } ; printf "\n",""
      for (i in x) { printf "%s",x[i]
        for (j in file) { if (y[x[i],file[j]] != "")
                             { printf"\t%s",y[x[i],file[j]] }
                          else { printf"\t%s","1"}
        }
        printf "\n",""
      }
    }

输出仅以制表符分隔，对于固定格式，需要相应调整 printf 命令：

    file1   file2   file3
2000    0.0202094   0.0343179   1
2001    0.0225532   1   0.03
2002    0.02553 1   1
2003    0.0261099   0.039579    1
2004    1   0.0412106   0.068689
2006    0.028843    0.041264    0.0645474

Answer

有点笨拙，但是这个awk代码可以工作。它使用的选项伪多维数组与数组索引的 SUBSEP 串联。将所有数据保存在 RAM 中，因此在这种情况下受到限制。

{x[$1]=$1 ; file[FILENAME]=FILENAME ; y[$1,FILENAME]=$2}

END { for (i in file) { printf "\t%s",file[i] } ; printf "\n",""
      for (i in x) { printf "%s",x[i]
        for (j in file) { if (y[x[i],file[j]] != "")
                             { printf"\t%s",y[x[i],file[j]] }
                          else { printf"\t%s","1"}
        }
        printf "\n",""
      }
    }

输出仅以制表符分隔，对于固定格式，需要相应调整 printf 命令：

    file1   file2   file3
2000    0.0202094   0.0343179   1
2001    0.0225532   1   0.03
2002    0.02553 1   1
2003    0.0261099   0.039579    1
2004    1   0.0412106   0.068689
2006    0.028843    0.041264    0.0645474

按第一列合并多个文件

答案1

要将其应用于多个文件：

带标题打印：

答案2

相关内容