我有以下 csv 输入:
XiaoLi,6705462234,[email protected],NC764
NatkinPook,8044344528,[email protected],VA22345
EliziMoe,5208534566,[email protected],AZ85282
MaTa,4345667345,[email protected],TX91030
DianaCheng,5203456789,[email protected],WY4587
JacksonFive,5206564573,[email protected],AZ85483
AdiSrikanthReddy,6578904566,[email protected],WS67854
我希望它输出以下内容:
Xiao Li 6705462234 [email protected] NC 764
Natkin Pook 8044344528 [email protected] VA 22345
Elizi Moe 5208534566 [email protected] AZ 85282
Ma Ta 4345667345 [email protected] TX 91030
Diana Cheng 5203456789 [email protected] WY 4587
Jackson Five 5206564573 [email protected] AZ 85483
Adi SrikanthReddy 6578904566 [email protected] WS 67854
( FirstName LastName PhoneNumber UserID@Email State Zip
)
这就是我到目前为止所拥有的
awk -F "," ' {print $1, $4, $3, $6}' data3
我无法将名字和姓氏彼此分开,并且州和邮政编码也一起运行。我怎样才能区分这两种情况?
我想使用 awk,有没有办法可以使用 [AZ] 之类的东西来分隔它们的大写字母?
答案1
我看到用户 Steeldriver 的答案已被接受,但我想提供一个我认为更短、更简单且更易于阅读的选项。至少,它展示了 awk 的一些其他功能(OP 总是可以改变他/她的想法):
awk '
{ gsub(","," ")
$0=gensub("([[:upper:]])([[:digit:]])","\\1 \\2","g")
$0=gensub("([[:lower:]])([[:upper:]])","\\1 \\2","g")
print
}' file.csv
答案2
至少对于gawk
(GNU awk) 和mawk
,您可以使用该match
函数来查找小写-大写或大写-数字转换的索引,然后用于substr
剪切和关闭字符串:
awk -F, '
{c = match($1,/[a-z][A-Z]/)}
c>0 {$1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))}
{c = match($4,/[A-Z][0-9]/)}
c>0 {$4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))}
1' file.csv
Xiao Li 6705462234 [email protected] NC 764
Natkin Pook 8044344528 [email protected] VA 22345
Elizi Moe 5208534566 [email protected] AZ 85282
Ma Ta 4345667345 [email protected] TX 91030
Diana Cheng 5203456789 [email protected] WY 4587
Jackson Five 5206564573 [email protected] AZ 85483
Adi SrikanthReddy 6578904566 [email protected] WS 67854
如果您$4
确实是美国邮政编码,那么据我所知,格式是固定的,您可以跳过第二个match
,然后执行
awk -F, '
{c = match($1,/[a-z][A-Z]/)}
c>0 {$1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))}
{$4 = sprintf("%s %s", substr($4,1,2), substr($4,3))}
1' file.csv
如果您有一个允许零长度断言的正则表达式引擎,那么它会更整洁一些 - 例如 Perl:
perl -F, -ne '
print join " ", map { s/(?<=[[:lower:]])(?=[[:upper:]])|(?<=[[:upper:]])(?=[[:digit:]])/ /; $_ } @F
' file.csv
Xiao Li 6705462234 [email protected] NC 764
Natkin Pook 8044344528 [email protected] VA 22345
Elizi Moe 5208534566 [email protected] AZ 85282
Ma Ta 4345667345 [email protected] TX 91030
Diana Cheng 5203456789 [email protected] WY 4587
Jackson Five 5206564573 [email protected] AZ 85483
Adi SrikanthReddy 6578904566 [email protected] WS 67854