使用 awk 比较两个 csv 文件并附加值

使用 awk 比较两个 csv 文件并附加值

我有两个文件如下: 文件1.csv

+------------+----------+--------+---------+
| Account_ID | Asset_ID | LOT_ID | FLAG_F1 |
+------------+----------+--------+---------+
|      10000 |    20000 |  30000 | Y       |
|      10001 |    20001 |  30001 | N       |
|      10002 |    20002 |  30002 | Y       |
|      10003 |    20003 |  30003 | N       |
|      10004 |    20004 |  30004 | Y       |
|      10005 |    20005 |  30005 | N       |
|      10006 |    20006 |  30006 | Y       |
+------------+----------+--------+---------+

文件2.csv

   +------------+----------+--------+---------+-----+-----+
| Account_ID | Asset_ID | LOT_ID | FLAG_F2 | XYZ | ABC |
+------------+----------+--------+---------+-----+-----+
|      10000 |    20000 |  30000 | Y       | XYZ | ABC |
|      10001 |    20001 |  30001 | Y       | XYZ | ABC |
|      10002 |    20002 |  30002 | Y       | XYZ | ABC |
|      10003 |    20003 |  30003 | Y       | XYZ | ABC |
|      10004 |    20004 |  30004 | Y       | XYZ | ABC |
|      10005 |    20005 |  30005 | Y       | XYZ | ABC |
|      10006 |    20006 |  30006 | Y       | XYZ | ABC |
|      10006 |    20006 |  30006 | Y       | XYZ | ABC |
|      10006 |    20006 |  30006 | Y       | XYZ | ABC |
+------------+----------+--------+---------+-----+-----+

我正在尝试获得以下内容输出:

    +------------+----------+--------+---------+-----+-----+---------+
| Account_ID | Asset_ID | LOT_ID | FLAG_F2 | XYZ | ABC | FLAG_F1 |
+------------+----------+--------+---------+-----+-----+---------+
|      10000 |    20000 |  30000 | Y       | XYZ | ABC | Y       |
|      10001 |    20001 |  30001 | Y       | XYZ | ABC | N       |
|      10002 |    20002 |  30002 | Y       | XYZ | ABC | Y       |
|      10003 |    20003 |  30003 | Y       | XYZ | ABC | N       |
|      10004 |    20004 |  30004 | Y       | XYZ | ABC | Y       |
|      10005 |    20005 |  30005 | Y       | XYZ | ABC | N       |
|      10006 |    20006 |  30006 | Y       | XYZ | ABC | Y       |
|      10006 |    20006 |  30006 | Y       | XYZ | ABC | Y       |
|      10007 |    20007 |  30006 | Y       | XYZ | ABC |         |
|      10006 |    20003 |  30006 | Y       | XYZ | ABC |         |
+------------+----------+--------+---------+-----+-----+---------+

在上面的输出中,我将FLAG_F1from添加到的条件file1.csv中,并且和 的值相等。如果条件失败,则可以为空。file2.csvAccount_ID,Asset_ID,LOT_IDfile1.csvfile2.csv

我尝试过以下使用 awk 的代码使用 awk 比较两个 .csv 文件

awk -F',' '
    FNR == NR {
        if (FNR == 1) {next}
        a[$1] = $2;
        b[$1] = $3;
        next;
    }
    {
        if (FNR == 1) {print;next}
        if (a[$1] == $2) {
            print $1,$2,$3,b[$1];
        }
        else {
            print $1,a[$1],b[$1],b[$1];
        }
    }
  ' OFS=',' file1.csv file2.csv

如果有人逐行向我解释上面的代码就更好了。

答案1

这比链接的问题简单得多。所有你需要的是:

awk -F, -v OFS=, 'NR==FNR{a[$1$2$3]=$4; next}{print $0,a[$1$2$3]}' file1 file2

解释

  • -F,:将输入字段分隔符设置为逗号。
  • -v OFS=,:将输出字段分隔符设置为逗号。默认情况下,这对于打印逗号分隔的输出很有用。
  • NR==FNR:NR为当前行号,FNR为当前文件的行号。仅当读取第一个文件时,两者才会相同。
  • a[$1$2$3]=$4; next:如果这是第一个文件(见上文),则将第四个字段保存在一个数组中,该数组的键是连接的第一个、第二个和第三个字段。
  • print $0,a[$1$2$3]:打印当前行 ( ) 以及与前三个字段关联的数组$0中的值。a这是第一个文件对应的第四个字段。

答案2

awk -F',' ' # start awk and use comma as a field separator
    FNR == NR { # if processed so far number of rows in current file if equal to overall processed number of rows do things in block {} 
        if (FNR == 1) {next} # if it is first row then continue (skip to next row)
        a[$1] = $2; # create an array indexed with first field, with value of second field
        b[$1] = $3; # another array
        next; # go to next row
    } # end of block executed only for first file
    { # beginning of block which will be executed without any initial conditions
        if (FNR == 1) {print;next} # if first row of file then print it and go to next one
        if (a[$1] == $2) { # if array value which correspond to field first is equal to second field do something (array 'a' has been set in first file, and now we input index to file from second file knowing that first fields of those files are the same)
            print $1,$2,$3,b[$1]; # print field 1-3 and array b[$1]
        }
        else { # if array is not equal
            print $1,a[$1],b[$1],b[$1]; # print stuff
        }
    }
  ' OFS=',' file1.csv file2.csv # OFS means output field separator, so we want to have comma in result too.

相关内容