调整两个文件

调整两个文件

我正在尝试根据第一列合并两个 csv 文件(包括“完全字符串匹配”和“部分匹配”),如果不匹配 - 添加这些字符串,但第二列为空白,例如

file1.csv
string1.str1.co.in,ZSER
string2.str2.com,ABCD
string3.str.co.in,ZSE
string4.str2.com,ACD
......

file2.csv
string1.str1.co.in, [A], hello1, hello2
string2.str2.com, 2nd, hello
string3, 3rd, helloz
string4, 4th, hellox
string5, 5th, helloo
string6, 6th, helloop
......


expected ouput
string1.str1.co.in,ZSER, [A], hello1, hello2
string2.str2.com,ABCD, 2nd, hello
string3,ZSE, 3rd, helloz
string4,ACD, 4th, hellox
string5,   ,5th, helloo
string6,   ,6th, helloop
.....

我已经尝试过以下方法,但只适用于精确匹配和打印,我们可以再添加几行来获得预期的效果吗?

awk '
 BEGIN{
 FS=OFS=","
 }
 FNR==NR{
 sub(/^ /,"")
 val=$1
 $1=""
 sub(/,/,"")
 sub(/,$/,"")
 a[val]=$0
 next
 }
 $1 in a{
 $1=$1 OFS a[$1]
 print $0
 }' file1.csv file2.csv

string1.str1.co.in,ZSER, [A], hello1, hello2
string2.str2.com,ABCD, 2nd, hello

如果这导致难以获得预期的结果,请提出替代方案

答案1

您可以拆分第一个字段,.并仅使用第一个元素作为关联数组键。鉴于:

$ head file{1,2}
==> file1 <==
string1.str1.co.in,ZSER
string2.str2.com,ABCD
string3.str.co.in,ZSE
string4.str2.com,ACD

==> file2 <==
string1.str1.co.in, [A], hello1, hello2
string2.str2.com, 2nd, hello
string3, 3rd, helloz
string4, 4th, hellox
string5, 5th, helloo
string5, 6th, helloop

然后

$ awk '
    BEGIN {OFS=FS=","}
    split($1,b,".") {key = b[1]}
    NR==FNR {a[key] = $2; next}
    (key in a) {$1 = $1 OFS a[key]}
    1
' file1 file2
string1.str1.co.in,ZSER, [A], hello1, hello2
string2.str2.com,ABCD, 2nd, hello
string3,ZSE, 3rd, helloz
string4,ACD, 4th, hellox
string5, 5th, helloo
string5, 6th, helloop

如果您想在不匹配的情况下输出空白或其他字符串,您可以将最后一个模式-动作对从 更改(key in a) {$1 = $1 OFS a[key]}{$1 = (key in a) ? $1 OFS a[key] : $1 OFS " "}

$ awk '
    BEGIN {OFS=FS=","}
    split($1,b,".") {key = b[1]}
    NR==FNR {a[key] = $2; next}
    {$1 = (key in a) ? $1 OFS a[key] : $1 OFS " "}
    1
' file1 file2
string1.str1.co.in,ZSER, [A], hello1, hello2
string2.str2.com,ABCD, 2nd, hello
string3,ZSE, 3rd, helloz
string4,ACD, 4th, hellox
string5, , 5th, helloo
string5, , 6th, helloop

相关内容