sed 或 awk 通过将行的一部分复制到行尾来重新格式化一行

sed 或 awk 通过将行的一部分复制到行尾来重新格式化一行

我有一个 CSV 文件,希望在将其导入 SQLITE 数据库之前对其进行编辑。它有数千行,我想复制该行的一部分并用管道“|”将其附加到末尾。以便可以轻松地对其进行分隔并导入到数据库中。

csv 包含如下行:

989155126903533568|2018-04-25|14:52:14|GMT|report|"""Умственно отстал"" was checked -  http://steamcommunity.com/profiles/76561198402636850 …"|0|0|0|
989154874184085505|2018-04-25|14:51:14|GMT|report|"""Clavicus Vile"" was checked (8 reports) -  http://steamcommunity.com/profiles/76561198006267103 …"|0|0|0|
989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) -  http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|

我想复制 765 号码并将其添加到行尾,如下所示:

989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) -  http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|76561198161608591

我想对 csv 中的每一行执行此操作。所以也许需要一个 for 循环。我不知道。

答案1

sed解决方案:

sed -E 's/.*\/profiles\/([0-9]+).*/&\1/' file.csv

示例输出:

989155126903533568|2018-04-25|14:52:14|GMT|report|"""Умственно отстал"" was checked -  http://steamcommunity.com/profiles/76561198402636850 …"|0|0|0|76561198402636850
989154874184085505|2018-04-25|14:51:14|GMT|report|"""Clavicus Vile"" was checked (8 reports) -  http://steamcommunity.com/profiles/76561198006267103 …"|0|0|0|76561198006267103
989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) -  http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|76561198161608591

答案2

awk

awk -F'["/]' '{print $0$(NF-1)}' infile > outfile

print整行$0和倒数第二个字段$(NF-1),其中字段分隔符-F是一组'[...]'引号"或斜杠/infile并将结果保存到outfile.

答案3

$ sed -E 'h;s/.*(http[^ ]*).*/\1/;s/.*\///;H;x;s/\n//' file
989155126903533568|2018-04-25|14:52:14|GMT|report|"""Умственно отстал"" was checked -  http://steamcommunity.com/profiles/76561198402636850 …"|0|0|0|76561198402636850
989154874184085505|2018-04-25|14:51:14|GMT|report|"""Clavicus Vile"" was checked (8 reports) -  http://steamcommunity.com/profiles/76561198006267103 …"|0|0|0|76561198006267103
989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) -  http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|76561198161608591

带注释的脚本sed

h                        # save a copy of the current line in the "hold space"
s/.*(http[^ ]*).*/\1/    # remove everything but the URL
s/.*\///                 # trim the URL so that only the last bit (the number) is left
H                        # add that last bit to the "hold space" (with a newline in-between)
x                        # swap the "hold space" and the "pattern space"
s/\n//                   # delete that inserted newline
                         # (implicit print at the end)

这假设 URL 始终是仅有的URL就行了,就是这样总是由空格字符分隔。

相关内容