我复制了部分 csv 文件。
publish_date,headline_text,likes_count,comments_count,shares_count,love_count,wow_count,haha_count,sad_count,thankful_count,angry_count
20030219,aba decides against community broadcasting licence,1106,118,109,155,6,5,2,0,6
20030219,act fire witnesses must be aware of defamation,137,362,67,0,0,0,0,0,0
20030219,a g calls for infrastructure protection summit,357,119,212,0,0,0,0,0,0
20030219,air nz staff in aust strike for pay rise,826,254,105,105,21,45,7,0,90
20030219,air nz strike to affect australian travellers,693,123,153,17,113,4,103,0,7
20030219,ambitious olsson wins triple jump,488,57,161,0,0,0,0,0,0
20030219,antic delighted with record breaking barca,386,60,80,3,4,0,93,0,68
20030219,aussie qualifier stosur wastes four memphis match,751,45,297,0,0,0,0,0,0
20030219,aust addresses un security council over iraq,3847,622,141,1,0,0,0,0,0
20030219,australia is locked into war timetable opp,1330,205,874,0,0,0,0,0,0
20030219,australia to contribute 10 million in aid to iraq,3530,130,0,23,16,4,1,0,0
20030219,barca take record as robson celebrates birthday in,13875,331,484,0,0,0,0,0,0
20030219,bathhouse plans move ahead,11202,450,2576,433,51,20,4,0,34
20030219,big hopes for launceston cycling championship,3988,445,955,0,0,0,0,0,0
20030219,big plan to boost paroo water supplies,460,101,92,0,0,0,0,0,0
20030219,blizzard buries united states in bills,303,223,193,0,0,0,0,0,0
我想找到一个 shell 命令,它可以帮助我创建一个新列,将每个条目相加 (likes_count+love_count+thankful_count) - (angry_count+sad_count) 并将该列命名为情感_极性。
我努力了
awk -F , {$12=$3+$6+$10-$11-$9;}{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12} file
但由于某种原因,它不起作用,列混合在一起。我想这可能是因为我执行此操作时丢失了逗号
答案1
集合 OFS (氧输出F产量S分离器) 也这样你就不会丢失逗号。当您这样做时$12=$3+$6+$10-$11-$9
,即设置/更新任何列的值(在本例中),它会丢失逗号awk根据 OFS 内部变量(默认为空格字符)对当前行进行字段分割,因此将其设置为逗号将在打印时保留这些内容。
awk 'BEGIN{ FS=OFS="," }
{ $(NF+1)=(NR==1? "emotional_polarity" : $3+$6+$10-$11-$9); print }' infile
或者简单地将新的更新附加到当前输入行:
awk -F, '{ $0=$0 FS (NR==1? "emotional_polarity" : $3+$6+$10-$11-$9); print }' infile
来自awk 手册:
FS
输入字段分隔符(请参阅部分指定字段的分隔方式)。该值是单字符字符串或多字符正则表达式,与输入记录中字段之间的分隔相匹配。欧福斯
输出字段分隔符(参见部分输出分隔符)。它在 print 语句打印的字段之间输出。它的默认值为“”,即由单个空格组成的字符串。
答案2
如果通过名称引用字段很有用(例如,如果列的顺序可以更改):
$ cat tst.awk
BEGIN { FS=OFS="," }
NR == 1 {
$(NF+1) = "emotional_polarity"
for (i=1; i<=NF; i++) {
f[$i] = i
}
}
NR > 1 {
$(f["emotional_polarity"]) = \
( $(f["likes_count"]) + $(f["love_count"]) + $(f["thankful_count"]) ) \
- ( $(f["angry_count"]) + $(f["sad_count"]) )
}
{ print }
$ awk -f tst.awk file
publish_date,headline_text,likes_count,comments_count,shares_count,love_count,wow_count,haha_count,sad_count,thankful_count,angry_count,emotional_polarity
20030219,aba decides against community broadcasting licence,1106,118,109,155,6,5,2,0,6,1253
20030219,act fire witnesses must be aware of defamation,137,362,67,0,0,0,0,0,0,137
20030219,a g calls for infrastructure protection summit,357,119,212,0,0,0,0,0,0,357
20030219,air nz staff in aust strike for pay rise,826,254,105,105,21,45,7,0,90,834
20030219,air nz strike to affect australian travellers,693,123,153,17,113,4,103,0,7,600
20030219,ambitious olsson wins triple jump,488,57,161,0,0,0,0,0,0,488
20030219,antic delighted with record breaking barca,386,60,80,3,4,0,93,0,68,228
20030219,aussie qualifier stosur wastes four memphis match,751,45,297,0,0,0,0,0,0,751
20030219,aust addresses un security council over iraq,3847,622,141,1,0,0,0,0,0,3848
20030219,australia is locked into war timetable opp,1330,205,874,0,0,0,0,0,0,1330
20030219,australia to contribute 10 million in aid to iraq,3530,130,0,23,16,4,1,0,0,3552
20030219,barca take record as robson celebrates birthday in,13875,331,484,0,0,0,0,0,0,13875
20030219,bathhouse plans move ahead,11202,450,2576,433,51,20,4,0,34,11597
20030219,big hopes for launceston cycling championship,3988,445,955,0,0,0,0,0,0,3988
20030219,big plan to boost paroo water supplies,460,101,92,0,0,0,0,0,0,460
20030219,blizzard buries united states in bills,303,223,193,0,0,0,0,0,0,303
答案3
我会对您尝试过的内容进行两处更改。这是你的命令:
awk -F , '{$12=$3+$6+$10-$11-$9;}{print }' file
当OFS=","
在块中使用时BEGIN
,我们的工作就完成了一半。这就是打印时分隔字段的方式。接下来if(NR==1) $NF="emotional_polarity"
做另一半。虽然使用$(NF+1)
比 更好$12
,但我会在这里使用$12
。$12=$a+..$b
将另一个字段添加到 $0。这会将 NF 值增加 1。因此if
语句将第1行( NR ==1
)的最后一个字段更改为“情感_极性”。现在,我将这两个表达式置于您的命令中。
awk -F , 'BEGIN{OFS=","}{$12=$3+$6+$10-$11-$9; if(NR==1) $NF="emotional_polarity"}{print }' file
我用这样的数组尝试过:
awk -F',' 'BEGIN{OFS=","}
{arr[NR][1]=$0; arr[NR][2]=$3+$6+$10-$11-$9;}
END {
arr[1][2]="emotional_polarity";
for(i=1;i<=NR;i++) print arr[i][1], arr[i][2] }' file
arr[NR][1]
获取所有 $0 输出,而arr[NR][2]
进行计算。
在END
块中,我们设置arr[1][2]
为“emotional_polarity”,因为我们想将该字段命名为情感_极性。然后我们告诉awk
打印。