我有一个 CSV 文件,希望在将其导入 SQLITE 数据库之前对其进行编辑。它有数千行,我想复制该行的一部分并用管道“|”将其附加到末尾。以便可以轻松地对其进行分隔并导入到数据库中。
csv 包含如下行:
989155126903533568|2018-04-25|14:52:14|GMT|report|"""Умственно отстал"" was checked - http://steamcommunity.com/profiles/76561198402636850 …"|0|0|0|
989154874184085505|2018-04-25|14:51:14|GMT|report|"""Clavicus Vile"" was checked (8 reports) - http://steamcommunity.com/profiles/76561198006267103 …"|0|0|0|
989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) - http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|
我想复制 765 号码并将其添加到行尾,如下所示:
989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) - http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|76561198161608591
我想对 csv 中的每一行执行此操作。所以也许需要一个 for 循环。我不知道。
答案1
sed
解决方案:
sed -E 's/.*\/profiles\/([0-9]+).*/&\1/' file.csv
示例输出:
989155126903533568|2018-04-25|14:52:14|GMT|report|"""Умственно отстал"" was checked - http://steamcommunity.com/profiles/76561198402636850 …"|0|0|0|76561198402636850
989154874184085505|2018-04-25|14:51:14|GMT|report|"""Clavicus Vile"" was checked (8 reports) - http://steamcommunity.com/profiles/76561198006267103 …"|0|0|0|76561198006267103
989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) - http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|76561198161608591
答案2
和awk
:
awk -F'["/]' '{print $0$(NF-1)}' infile > outfile
print
整行$0
和倒数第二个字段$(NF-1)
,其中字段分隔符-F
是一组'[...]'
引号"
或斜杠/
,infile
并将结果保存到outfile
.
答案3
$ sed -E 'h;s/.*(http[^ ]*).*/\1/;s/.*\///;H;x;s/\n//' file
989155126903533568|2018-04-25|14:52:14|GMT|report|"""Умственно отстал"" was checked - http://steamcommunity.com/profiles/76561198402636850 …"|0|0|0|76561198402636850
989154874184085505|2018-04-25|14:51:14|GMT|report|"""Clavicus Vile"" was checked (8 reports) - http://steamcommunity.com/profiles/76561198006267103 …"|0|0|0|76561198006267103
989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) - http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|76561198161608591
带注释的脚本sed
:
h # save a copy of the current line in the "hold space"
s/.*(http[^ ]*).*/\1/ # remove everything but the URL
s/.*\/// # trim the URL so that only the last bit (the number) is left
H # add that last bit to the "hold space" (with a newline in-between)
x # swap the "hold space" and the "pattern space"
s/\n// # delete that inserted newline
# (implicit print at the end)
这假设 URL 始终是仅有的URL就行了,就是这样总是由空格字符分隔。