使用 awk 比较多列文件并打印输出

使用 awk 比较多列文件并打印输出

文件1:

cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 283 F30109_THK_T0_CFPUR0701_DS005 2022-06-02 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 284 F30109_THK_T0_CFPUR0701_DS005 2022-06-03 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 285 F30109_THK_T0_CFPUR0701_DS005 2022-06-04 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 286 F30109_THK_T0_CFPUR0701_DS005 2022-06-05 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 287 F30109_THK_T0_CFPUR0701_DS005 2022-06-06 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 288 F30109_THK_T0_CFPUR0701_DS005 2022-06-07 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS006 hostgroups-1d1w 282 F30109_THK_T0_CFPUR0701_DS006 2022-06-01 00:00:00
cfpur0702 Pgroup F30109 cf3010922 F30109_FPUR0702_PRD03E_DS006 hostgroups-1d1w 283 F30109_THK_T0_CFPUR0702_DS006 2022-06-02 00:00:00
cfpur0702 Pgroup F30109 cf3010922 F30109_FPUR0702_PRD03E_DS006 hostgroups-1d1w 284 F30109_THK_T0_CFPUR0702_DS006 2022-06-03 00:00:00
cfpur0702 Pgroup F30109 cf3010922 F30109_FPUR0702_PRD03E_DS006 hostgroups-1d1w 285 F30109_THK_T0_CFPUR0702_DS006 2022-06-04 00:00:00
cfpur0702 Pgroup F30109 cf3010922 F30109_FPUR0702_PRD03E_DS006 hostgroups-1d1w 286 F30109_THK_T0_CFPUR0702_DS006 2022-06-05 00:00:00
cfpur0702 Pgroup F30109 cf3010922 F30109_FPUR0702_PRD03E_DS006 hostgroups-1d1w 287 F30109_THK_T0_CFPUR0702_DS006 2022-06-06 00:00:00
cfpur0703 Pgroup F30109 cf3010922 F30109_FPUR0703_PRD03E_DS006 hostgroups-1d1w 288 F30109_THK_T0_CFPUR0703_DS006 2022-06-07 00:00:00
cfpur0703 Pgroup F30109 cf3010922 F30109_FPUR0703_PRD03E_DS007 hostgroups-1d1w 282 F30109_THK_T0_CFPUR0703_DS007 2022-06-01 00:00:00
cfpur0703 Pgroup F30109 cf3010922 F30109_FPUR0703_PRD03E_DS007 hostgroups-1d1w 283 F30109_THK_T0_CFPUR0703_DS007 2022-06-02 00:00:00
cfpur0703 Pgroup F30109 cf3010922 F30109_FPUR0703_PRD03E_DS007 hostgroups-1d1w 284 F30109_THK_T0_CFPUR0703_DS007 2022-06-03 00:00:00
cfpur0703 Pgroup F30109 cf3010922 F30109_FPUR0703_PRD03E_DS007 hostgroups-1d1w 285 F30109_THK_T0_CFPUR0703_DS007 2022-06-04 00:00:00

文件2:

cfpur0701 hostgroups-1d1w 2022-06-02 00:00:00 2022-06-09 00:00:00
cfpur0701 hostgroups-1d1w 2022-06-03 00:00:00 2022-06-10 00:00:00
cfpur0701 hostgroups-1d1w 2022-06-04 00:00:00 2022-06-11 00:00:00
cfpur0701 hostgroups-1d1w 2022-06-05 00:00:00 2022-06-12 00:00:00
cfpur0701 hostgroups-1d1w 2022-06-06 00:00:00 2022-06-13 00:00:00
cfpur0701 hostgroups-1d1w 2022-06-07 00:00:00 2022-06-14 00:00:00
cfpur0701 hostgroups-1d1w 2022-06-01 00:00:00 2022-06-08 00:00:00
cfpur0702 hostgroups-1d1w 2022-06-02 00:00:00 2022-06-09 00:00:00
cfpur0702 hostgroups-1d1w 2022-06-03 00:00:00 2022-06-10 00:00:00
cfpur0702 hostgroups-1d1w 2022-06-04 00:00:00 2022-06-11 00:00:00
cfpur0702 hostgroups-1d1w 2022-06-05 00:00:00 2022-06-12 00:00:00
cfpur0702 hostgroups-1d1w 2022-06-06 00:00:00 2022-06-13 00:00:00
cfpur0702 hostgroups-1d1w 2022-06-07 00:00:00 2022-06-14 00:00:00
cfpur0703 hostgroups-1d1w 2022-06-01 00:00:00 2022-06-08 00:00:00
cfpur0703 hostgroups-1d1w 2022-06-02 00:00:00 2022-06-09 00:00:00
cfpur0703 hostgroups-1d1w 2022-06-03 00:00:00 2022-06-10 00:00:00
cfpur0703 hostgroups-1d1w 2022-06-04 00:00:00 2022-06-11 00:00:00
cfpur0703 hostgroups-1d1w 2022-06-05 00:00:00 2022-06-12 00:00:00
cfpur0703 hostgroups-1d1w 2022-06-06 00:00:00 2022-06-13 00:00:00
cfpur0703 hostgroups-1d1w 2022-06-07 00:00:00 2022-06-14 00:00:00
cfpur0801 hostgroups-1d1w 2022-06-01 00:00:00 2022-06-08 00:00:00
cfpur0801 hostgroups-1d1w 2022-06-02 00:00:00 2022-06-09 00:00:00
cfpur0801 hostgroups-1d1w 2022-06-03 00:00:00 2022-06-10 00:00:00
cfpur0801 hostgroups-1d1w 2022-06-04 00:00:00 2022-06-11 00:00:00

期望的输出:

cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 283 F30109_THK_T0_CFPUR0701_DS005 2022-06-02 00:00:00 2022-06-09 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 284 F30109_THK_T0_CFPUR0701_DS005 2022-06-03 00:00:00 2022-06-10 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 285 F30109_THK_T0_CFPUR0701_DS005 2022-06-04 00:00:00 2022-06-11 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 286 F30109_THK_T0_CFPUR0701_DS005 2022-06-05 00:00:00 2022-06-12 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 287 F30109_THK_T0_CFPUR0701_DS005 2022-06-06 00:00:00 2022-06-13 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS005 hostgroups-1d1w 288 F30109_THK_T0_CFPUR0701_DS005 2022-06-07 00:00:00 2022-06-14 00:00:00
cfpur0701 Pgroup F30109 cf3010922 F30109_FPUR0701_PRD03E_DS006 hostgroups-1d1w 282 F30109_THK_T0_CFPUR0701_DS006 2022-06-01 00:00:00 2022-06-08 00:00:00
cfpur0702 Pgroup F30109 cf3010922 F30109_FPUR0702_PRD03E_DS006 hostgroups-1d1w 283 F30109_THK_T0_CFPUR0702_DS006 2022-06-02 00:00:00 2022-06-09 00:00:00
cfpur0702 Pgroup F30109 cf3010922 F30109_FPUR0702_PRD03E_DS006 hostgroups-1d1w 284 F30109_THK_T0_CFPUR0702_DS006 2022-06-03 00:00:00 2022-06-10 00:00:00
cfpur0702 Pgroup F30109 cf3010922 F30109_FPUR0702_PRD03E_DS006 hostgroups-1d1w 285 F30109_THK_T0_CFPUR0702_DS006 2022-06-04 00:00:00 2022-06-11 00:00:00
cfpur0702 Pgroup F30109 cf3010922 F30109_FPUR0702_PRD03E_DS006 hostgroups-1d1w 286 F30109_THK_T0_CFPUR0702_DS006 2022-06-05 00:00:00 2022-06-12 00:00:00
cfpur0702 Pgroup F30109 cf3010922 F30109_FPUR0702_PRD03E_DS006 hostgroups-1d1w 287 F30109_THK_T0_CFPUR0702_DS006 2022-06-06 00:00:00 2022-06-13 00:00:00
cfpur0703 Pgroup F30109 cf3010922 F30109_FPUR0703_PRD03E_DS006 hostgroups-1d1w 288 F30109_THK_T0_CFPUR0703_DS006 2022-06-07 00:00:00 2022-06-14 00:00:00
cfpur0703 Pgroup F30109 cf3010922 F30109_FPUR0703_PRD03E_DS007 hostgroups-1d1w 282 F30109_THK_T0_CFPUR0703_DS007 2022-06-01 00:00:00 2022-06-08 00:00:00
cfpur0703 Pgroup F30109 cf3010922 F30109_FPUR0703_PRD03E_DS007 hostgroups-1d1w 283 F30109_THK_T0_CFPUR0703_DS007 2022-06-02 00:00:00 2022-06-09 00:00:00
cfpur0703 Pgroup F30109 cf3010922 F30109_FPUR0703_PRD03E_DS007 hostgroups-1d1w 284 F30109_THK_T0_CFPUR0703_DS007 2022-06-03 00:00:00 2022-06-10 00:00:00
cfpur0703 Pgroup F30109 cf3010922 F30109_FPUR0703_PRD03E_DS007 hostgroups-1d1w 285 F30109_THK_T0_CFPUR0703_DS007 2022-06-04 00:00:00 2022-06-11 00:00:00

文件 1 中的行数始终大于文件 2 中的行数。我需要执行以下操作:

比较文件 1 和文件 2 中的第 1 列,如果它们匹配,则将文件 1 中的第 6 列与文件 2 中的第 2 列进行比较,最后将文件 1 中的第 9 列与文件 2 中的第 3 列进行比较。当满足所有 3 个条件时,获取时间戳(第 5 列) & 6 ) 从 file2 并将其附加到 file1 从而生成输出文件。

我尝试了 awk 的几个版本,NR==FNR但没有取得太大进展。

答案1

awk 'FNR==NR{map[$1,$2,$3,$4] = $5 FS $6; next}
     ($1,$6,$9,$10) in map {print $0,map[$1,$6,$9,$10]}' file2 file1

我在映射中使用了日期和时间字段,因为我认为您需要它(与 file1 的最后两列匹配)

在第一个文件传递 ( FNR==NR) 中,我们将要附加的字段(最后 2 个)存储到常用数组中,前 4 个用作哈希。next避免为第一个文件的任何行运行其余代码是必要的。

对于第二个文件,仅当所选字段形成现有数组散列时,我们才会追加数组值并打印行。

相关内容