我有一个 csv 文件,我希望仅提取列号7和11。根据第 7 列的值(该列具有值)OK
或KO
我希望在其旁边插入一列(称为value
映射位置):
OK -> 0
KO -> 1
此外,我在开头添加了一个简单的字符串列。
我的.awk
文件如下:
BEGIN {FS=";";OFS=","}
{
value=0
if($7=="KO") {
value=1
}
print "Measure_QS",$7,value,$11
}
使用该文件:
gawk -f converter.awk Dataset.csv | head -n 10
提供以下内容:
Measure_QS,result,0,time_stamp
Measure_QS,OK,0,2020-01-17 11:53:33.000
Measure_QS,OK,0,2020-01-17 11:53:22.000
Measure_QS,OK,0,2020-01-17 11:51:42.000
Measure_QS,OK,0,2020-01-17 11:51:30.000
Measure_QS,OK,0,2020-01-17 11:51:06.000
Measure_QS,OK,0,2020-01-17 11:50:53.000
Measure_QS,OK,0,2020-01-17 11:50:41.000
Measure_QS,OK,0,2020-01-17 11:50:29.000
Measure_QS,OK,0,2020-01-17 11:50:17.000
标题是Measure_QS,result,0,time_stamp
.我希望有以下标题:Measure_QS,result,value,time_stamp
我哪里出错了?
答案1
我能够使用NR
变量来解决这个问题gawk
BEGIN {FS=";";OFS=",";print "measurement","result","value","time_stamp"}
{
value=0
if(NR!=1) {
if($7=="KO") {
value=1
}
print "Measure_QS",$7,value,$11
}
}
这能够很好地与
gawk -f converter.awk Dataset.csv | head -n 10
measurement,result,value,time_stamp
Measure_QS,OK,0,2020-01-17 11:53:33.000
Measure_QS,OK,0,2020-01-17 11:53:22.000
Measure_QS,OK,0,2020-01-17 11:51:42.000
Measure_QS,OK,0,2020-01-17 11:51:30.000
Measure_QS,OK,0,2020-01-17 11:51:06.000
Measure_QS,OK,0,2020-01-17 11:50:53.000
Measure_QS,OK,0,2020-01-17 11:50:41.000
Measure_QS,OK,0,2020-01-17 11:50:29.000
Measure_QS,OK,0,2020-01-17 11:50:17.000