我正在尝试根据第一列合并两个 csv 文件(包括“完全字符串匹配”和“部分匹配”),如果不匹配 - 添加这些字符串,但第二列为空白,例如
file1.csv
string1.str1.co.in,ZSER
string2.str2.com,ABCD
string3.str.co.in,ZSE
string4.str2.com,ACD
......
file2.csv
string1.str1.co.in, [A], hello1, hello2
string2.str2.com, 2nd, hello
string3, 3rd, helloz
string4, 4th, hellox
string5, 5th, helloo
string6, 6th, helloop
......
expected ouput
string1.str1.co.in,ZSER, [A], hello1, hello2
string2.str2.com,ABCD, 2nd, hello
string3,ZSE, 3rd, helloz
string4,ACD, 4th, hellox
string5, ,5th, helloo
string6, ,6th, helloop
.....
我已经尝试过以下方法,但只适用于精确匹配和打印,我们可以再添加几行来获得预期的效果吗?
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
sub(/^ /,"")
val=$1
$1=""
sub(/,/,"")
sub(/,$/,"")
a[val]=$0
next
}
$1 in a{
$1=$1 OFS a[$1]
print $0
}' file1.csv file2.csv
string1.str1.co.in,ZSER, [A], hello1, hello2
string2.str2.com,ABCD, 2nd, hello
如果这导致难以获得预期的结果,请提出替代方案
答案1
您可以拆分第一个字段,.
并仅使用第一个元素作为关联数组键。鉴于:
$ head file{1,2}
==> file1 <==
string1.str1.co.in,ZSER
string2.str2.com,ABCD
string3.str.co.in,ZSE
string4.str2.com,ACD
==> file2 <==
string1.str1.co.in, [A], hello1, hello2
string2.str2.com, 2nd, hello
string3, 3rd, helloz
string4, 4th, hellox
string5, 5th, helloo
string5, 6th, helloop
然后
$ awk '
BEGIN {OFS=FS=","}
split($1,b,".") {key = b[1]}
NR==FNR {a[key] = $2; next}
(key in a) {$1 = $1 OFS a[key]}
1
' file1 file2
string1.str1.co.in,ZSER, [A], hello1, hello2
string2.str2.com,ABCD, 2nd, hello
string3,ZSE, 3rd, helloz
string4,ACD, 4th, hellox
string5, 5th, helloo
string5, 6th, helloop
如果您想在不匹配的情况下输出空白或其他字符串,您可以将最后一个模式-动作对从 更改(key in a) {$1 = $1 OFS a[key]}
为{$1 = (key in a) ? $1 OFS a[key] : $1 OFS " "}
:
$ awk '
BEGIN {OFS=FS=","}
split($1,b,".") {key = b[1]}
NR==FNR {a[key] = $2; next}
{$1 = (key in a) ? $1 OFS a[key] : $1 OFS " "}
1
' file1 file2
string1.str1.co.in,ZSER, [A], hello1, hello2
string2.str2.com,ABCD, 2nd, hello
string3,ZSE, 3rd, helloz
string4,ACD, 4th, hellox
string5, , 5th, helloo
string5, , 6th, helloop