这是来自这里的问题对齐 csv 文件中的值
问题是我想根据第五列上的值与第一列的匹配方式来对齐它们。
输入:
"DB Instance Identifier","Engine","Instance Type","MultiAZ","Reservation ID","State","Start Time","Reserved Engine","Reserved DB Instance Class"
"alpha-db-dev","postgres","db.t4g.micro",false,"alpha-db-dev-26aug2022","active","2022-08-26","postgresql","db.t4g.micro"
"alpha-db-prod","postgres","db.t4g.small",true,"alpha-db-prod-26aug2022","active","2022-08-26","postgresql","db.t4g.small"
"alpha-db-staging","postgres","db.t4g.micro",false,"alpha-db-staging-26aug2022","active","2022-08-26","postgresql","db.t4g.micro"
"beta-db-dev-primary","aurora-postgresql","db.t3.medium",false,"charlie-db-dev-8dec2021","active","2021-12-08","postgresql","db.t3.micro"
"beta-db-prod-primary","aurora-postgresql","db.r5.xlarge",false,
"beta-db-prod-replica","aurora-postgresql","db.r5.xlarge",false,
"beta-db-staging-primary","aurora-postgresql","db.t3.medium",false,
"charlie-db-dev","postgres","db.t3.micro",false,
"charlie-db-prod-dms","postgres","db.m4.4xlarge",true,
"charlie-db-prod-dms-replica","postgres","db.m4.4xlarge",false,
"charlie-db-staging","postgres","db.t2.large",false,
"charlie-db-staging-loadtest","postgres","db.m4.4xlarge",false,
"charlie-kong-db-dev","postgres","db.t3.micro",false,
"charlie-kong-db-prod","postgres","db.m4.large",true,
"charlie-kong-db-staging","postgres","db.t2.small",false,
"delta-db-prod","mysql","db.t3.small",true,
"delta-db-recon-prod","mysql","db.t3.micro",false,
"delta-db-staging","mysql","db.t3.micro",false,
期望的输出是:
"DB Instance Identifier","Engine","Instance Type","MultiAZ","Reservation ID","State","Start Time","Reserved Engine","Reserved DB Instance Class"
"alpha-db-dev","postgres","db.t4g.micro",false,"alpha-db-dev-26aug2022","active","2022-08-26","postgresql","db.t4g.micro"
"alpha-db-prod","postgres","db.t4g.small",true,"alpha-db-prod-26aug2022","active","2022-08-26","postgresql","db.t4g.small"
"alpha-db-staging","postgres","db.t4g.micro",false,"alpha-db-staging-26aug2022","active","2022-08-26","postgresql","db.t4g.micro"
"beta-db-dev-primary","aurora-postgresql","db.t3.medium",false,
"beta-db-prod-primary","aurora-postgresql","db.r5.xlarge",false,
"beta-db-prod-replica","aurora-postgresql","db.r5.xlarge",false,
"beta-db-staging-primary","aurora-postgresql","db.t3.medium",false,
"charlie-db-dev","postgres","db.t3.micro",false,"charlie-db-dev-8dec2021","active","2021-12-08","postgresql","db.t3.micro"
"charlie-db-prod-dms","postgres","db.m4.4xlarge",true,
"charlie-db-prod-dms-replica","postgres","db.m4.4xlarge",false,
"charlie-db-staging","postgres","db.t2.large",false,
"charlie-db-staging-loadtest","postgres","db.m4.4xlarge",false,
"charlie-kong-db-dev","postgres","db.t3.micro",false,
"charlie-kong-db-prod","postgres","db.m4.large",true,
"charlie-kong-db-staging","postgres","db.t2.small",false,
"delta-db-prod","mysql","db.t3.small",true,
"delta-db-recon-prod","mysql","db.t3.micro",false,
"delta-db-staging","mysql","db.t3.micro",false,
Ed Morton 在上一个问题中用 awk 脚本回答了这个问题
BEGIN { FS=OFS="," }
NR == FNR {
id = $2
sub(/-[^-]+$/,"",id)
vals[id] = $2 OFS $3
next
}
{ print (FNR>1 ? $1 OFS vals[$1] : $0) }
然而,当我尝试理解并修改脚本中的值以匹配我正在处理的 csv 文件中的值时,我得到了一堆,
或根本得到了错误的值,因为上面的 awk 脚本最初比较了第 1 列和2. 但这一次我希望它比较第一列和第五列。然后整行(第 5-9 列)与第一列对齐。我该怎么办?我仍在学习 awk。
编辑:更改输入数据的值以防止混淆,因为我正在使用逗号分隔的值。我认为当我在 Excel 中打开文件并从那里粘贴而不是使用代码编辑器粘贴时,我粘贴不正确。抱歉造成混乱!
答案1
由于您似乎有多个相似的文件,其中包含不同数量的字段,因此我首先将输入拆分为头部(前 4 个字段加逗号)和尾部(倒数第 5 个字段),因此很容易做到无论您要匹配哪个字段和/或要在匹配字段之前/之后包含多少个字段,例如,如果您想比较并从第 7 个字段而不是第 5 个字段进行复制,您只需更改{4}
为{6}
在这个脚本中:
$ cat tst.awk
BEGIN { FS=OFS="," }
{
match($0,/([^,]+,){4}/)
head = substr($0,RSTART,RLENGTH)
tail = substr($0,RSTART+RLENGTH)
}
NR == FNR {
$0 = tail
id = $1
sub(/-[^-]+$/,"\"",id)
vals[id] = $0
next
}
{
print (FNR>1 ? head vals[$1] : $0)
}
$ awk -f tst.awk file file
"DB Instance Identifier","Engine","Instance Type","MultiAZ","Reservation ID","State","Start Time","Reserved Engine","Reserved DB Instance Class"
"alpha-db-dev","postgres","db.t4g.micro",false,"alpha-db-dev-26aug2022","active","2022-08-26","postgresql","db.t4g.micro"
"alpha-db-prod","postgres","db.t4g.small",true,"alpha-db-prod-26aug2022","active","2022-08-26","postgresql","db.t4g.small"
"alpha-db-staging","postgres","db.t4g.micro",false,"alpha-db-staging-26aug2022","active","2022-08-26","postgresql","db.t4g.micro"
"beta-db-dev-primary","aurora-postgresql","db.t3.medium",false,
"beta-db-prod-primary","aurora-postgresql","db.r5.xlarge",false,
"beta-db-prod-replica","aurora-postgresql","db.r5.xlarge",false,
"beta-db-staging-primary","aurora-postgresql","db.t3.medium",false,
"charlie-db-dev","postgres","db.t3.micro",false,"charlie-db-dev-8dec2021","active","2021-12-08","postgresql","db.t3.micro"
"charlie-db-prod-dms","postgres","db.m4.4xlarge",true,
"charlie-db-prod-dms-replica","postgres","db.m4.4xlarge",false,
"charlie-db-staging","postgres","db.t2.large",false,
"charlie-db-staging-loadtest","postgres","db.m4.4xlarge",false,
"charlie-kong-db-dev","postgres","db.t3.micro",false,
"charlie-kong-db-prod","postgres","db.m4.large",true,
"charlie-kong-db-staging","postgres","db.t2.small",false,
"delta-db-prod","mysql","db.t3.small",true,
"delta-db-recon-prod","mysql","db.t3.micro",false,
"delta-db-staging","mysql","db.t3.micro",false,
答案2
您现在使用的文件使用“,”作为字段分隔符,许多字段都用双引号引起来。你可以这样:
$ awk ' BEGIN{ FS=OFS="," }
NR==FNR{
if (NR==1) {next}
id=$5;
sub(/-[^-]+"$/,"\"",id);
a[id]=$5 OFS $6 OFS $7 OFS $8 OFS $9; next
}
{
if (FNR==1) {print; next}
print $1,$2,$3,$4,a[$1]
}
' infile infile
对于给定的输入,将打印以下内容:
"DB Instance Identifier","Engine","Instance Type","MultiAZ","Reservation ID","State","Start Time","Reserved Engine","Reserved DB Instance Class"
"alpha-db-dev","postgres","db.t4g.micro",false,"alpha-db-dev-26aug2022","active","2022-08-26","postgresql","db.t4g.micro"
"alpha-db-prod","postgres","db.t4g.small",true,"alpha-db-prod-26aug2022","active","2022-08-26","postgresql","db.t4g.small"
"alpha-db-staging","postgres","db.t4g.micro",false,"alpha-db-staging-26aug2022","active","2022-08-26","postgresql","db.t4g.micro"
"beta-db-dev-primary","aurora-postgresql","db.t3.medium",false,
"beta-db-prod-primary","aurora-postgresql","db.r5.xlarge",false,
"beta-db-prod-replica","aurora-postgresql","db.r5.xlarge",false,
"beta-db-staging-primary","aurora-postgresql","db.t3.medium",false,
"charlie-db-dev","postgres","db.t3.micro",false,"charlie-db-dev-8dec2021","active","2021-12-08","postgresql","db.t3.micro"
"charlie-db-prod-dms","postgres","db.m4.4xlarge",true,
"charlie-db-prod-dms-replica","postgres","db.m4.4xlarge",false,
"charlie-db-staging","postgres","db.t2.large",false,
"charlie-db-staging-loadtest","postgres","db.m4.4xlarge",false,
"charlie-kong-db-dev","postgres","db.t3.micro",false,
"charlie-kong-db-prod","postgres","db.m4.large",true,
"charlie-kong-db-staging","postgres","db.t2.small",false,
"delta-db-prod","mysql","db.t3.small",true,
"delta-db-recon-prod","mysql","db.t3.micro",false,
"delta-db-staging","mysql","db.t3.micro",false,