表1.csv
DATE, TIMESTAMP, ID, START TIME, END TIME, DURATION
2019-04-05, 13:57:19, 1607,13:06:42, 13:07:12, 00:00:30
2019-04-05, 13:58:00, 2327,13:57:26, 13:57:43, 00:00:17
2019-04-24, 12:30:00, 1836,11:20:01, 12:30:00, 01:09:59
2019-04-24, 12:30:00, 1836,11:20:01, 12:30:00, 01:09:59
2019-04-24, 15:30:01, 1836,14:50:01, 15:30:01, 00:40:00
2019-04-24, 15:30:01, 1836,14:50:01, 15:30:01, 00:40:00
表2.csv
KEY, ID, NAME
407, 1607, RECORD1
1127,2327, RECORD2
636, 1836, RECORD3
664, 1864, RECORD4
703, 1903, RECORD5
匹配 TABLE1.csv 中的第 3 列和 TABLE2.csv 中的第 2 列以及所需的输出,如下所示:
DATE, TIMESTAMP, ID, NAME, START TIME, END TIME, DURATION
2019-04-05, 13:57:19, 1607, RECORD1, 13:06:42, 13:07:12, 00:00:30
2019-04-05, 13:58:00, 2327, RECORD2, 13:57:26, 13:57:43, 00:00:17
2019-04-24, 12:30:00, 1836, RECORD3, 11:20:01, 12:30:00, 01:09:59
2019-04-24, 12:30:00, 1836, RECORD3, 11:20:01, 12:30:00, 01:09:59
2019-04-24, 15:30:01, 1836, RECORD3, 14:50:01, 15:30:01, 00:40:00
2019-04-24, 15:30:01, 1836, RECORD3, 14:50:01, 15:30:01, 00:40:00
我已经尝试过每个例子
awk -F',' 'FNR==NR.....
答案1
这个sed
收集保留空间中的替换TABLE2.csv
,然后执行列的添加:
sed 's/^[^,]*, *\([0-9]*,[^,]*\)$/\1/
T2
H;d
:2
G
s/\([^,]*,[^,]*, *\)\([0-9]*,\)\([^[:cntrl:]]*\)\n.*\n\2\([^[:cntrl:]]*\)/\1\2\4, \3/
P;d' TABLE2.csv TABLE1.csv
如果有必要,您必须为更改后的标题添加一些内容。
s/^[^,]*, *\([0-9]*,[^,]*\)$/\1/
TABLE2.csv
确实从行中提取第二列和第三列T2
:2
如果没有进行替换,则跳转到(因此,对于TABLE2.csv
行)- 更改的
TABLE2.csv
行将追加到H
旧空间,然后d
删除(停止执行该行) :2
是跳跃标记。以下所有内容TABLE1.csv
仅针对线路完成G
将查找表从保留空间追加到模式空间s/\([^,]*,[^,]*, *\)\([0-9]*,\)\([^[:cntrl:]]*\)\n.*\n\2\([^[:cntrl:]]*\)/\1\2\4, \3/
在查找表中查找 ID ([0-9]*,
) 的出现并添加匹配字段P
打印该行(没有添加的查找表)并d
停止该行的执行。
更详细的解释请参考如何在一个文件中定义的内容替换另一文件
答案2
我会用awk为了这
awk -F', *' -v OFS=', ' '
NR == FNR { name[$2] = $3; next }
{ $3 = $3 OFS name[$3]; print }
' TABLE{2,1}.csv
输出
DATE, TIMESTAMP, ID, NAME, START TIME, END TIME, DURATION
2019-04-05, 13:57:19, 1607, RECORD1, 13:06:42, 13:07:12, 00:00:30
2019-04-05, 13:58:00, 2327, RECORD2, 13:57:26, 13:57:43, 00:00:17
2019-04-24, 12:30:00, 1836, RECORD3, 11:20:01, 12:30:00, 01:09:59
2019-04-24, 12:30:00, 1836, RECORD3, 11:20:01, 12:30:00, 01:09:59
2019-04-24, 15:30:01, 1836, RECORD3, 14:50:01, 15:30:01, 00:40:00
2019-04-24, 15:30:01, 1836, RECORD3, 14:50:01, 15:30:01, 00:40:00
首先读取 TABLE2 并将 ID 映射到 NAME。
然后读取 TABLE1 并将 NAME 添加到 ID 字段。
设置列间距的格式由您决定。