有人可以帮助我解决这个问题吗?我有一个已提取的文件,文件内容如下所示。
(11213068, 2020-11-16) deleted
(1075227404, 2021-06-14) added
(11213177, 2020-11-16) deleted
(1075227413, 2021-06-14) added
(11213070, 2020-11-16) deleted
(1075193958, 2021-05-28) added
(1075194668, 2022-11-29) added
(1073757334, 2021-01-20) (1073757337, 2021-01-20) (1073757349, 2021-01-20) (1073757331, 2021-01-20) (1073757346, 2021-01-20) added
(1073757237, 2020-11-20) (1073757263, 2020-11-20) (1073757233, 2020-11-20) (1073757241, 2020-11-20) (1073757247, 2020-11-20) deleted
++ 我想要的文件结果是这样的 --
(11213068, 2020-11-16) delete
(1075227404, 2021-06-14) add
(11213177, 2020-11-16) delete
(1075227413, 2021-06-14) add
(11213070, 2020-11-16) delete
(1075193958, 2021-05-28) add
(1075194668, 2022-11-29) add
(1073757334, 2021-01-20) add
(1073757337, 2021-01-20) add
(1073757349, 2021-01-20) add
(1073757331, 2021-01-20) add
(1073757346, 2021-01-20) add
(1073757237, 2020-11-20) delete
(1073757263, 2020-11-20) delete
(1073757233, 2020-11-20) delete
(1073757241, 2020-11-20) delete
(1073757247, 2020-11-20) delete
在最后两行我无法找到解决方案。我使用这个命令的输出——
awk '$3!="added"' | awk '$3!="deleted"' | sed 's/) (/\n/g' file.txt
(11213068, 2020-11-16) deleted
(1075227404, 2021-06-14) added
(11213177, 2020-11-16) deleted
(1075227413, 2021-06-14) added
(11213070, 2020-11-16) deleted
(1075193958, 2021-05-28) added
(1075194668, 2022-11-29) added
(1073757334, 2021-01-20
1073757337, 2021-01-20
1073757349, 2021-01-20
1073757331, 2021-01-20
1073757346, 2021-01-20) added
(1073757237, 2020-11-20
1073757263, 2020-11-20
1073757233, 2020-11-20
1073757241, 2020-11-20
1073757247, 2020-11-20) deleted
感谢您的时间。
答案1
为此使用正确的分隔符。
awk -F') ' '{for (i=1;i<NF;i++) print $i FS $NF}' file
如果需要替换最后一个字段,有多种方法可以实现,例如sub()
在行处理的开头使用。
awk -F') ' '{sub(/added$/,"add"); sub(/deleted$/,"delete"); for (i=1;i<NF;i++) print $i FS $NF}' file
答案2
GNU sed
具有广泛的正则表达式模式-E
。
)
(
用换行标记标记夹在中间的区域。然后将最后一个字段(在其过去时态清理之后)传输到第一个标记中,打印到第一个标记,然后截断到第一个标记。重复这个过程,直到耗尽模式空间。
$ sed -Ee '/\n/ba
/e?d$/s/ (add|delete)e?d$/ \1/
s/[)] [(]/) \n(/g;:a
s/(\n.*)?\n.* (\S+)$/\2&/
/\n.*\n/{P;D;}
' file
$ perl -F'\)\s' -lane '$, = ") ";
my $l = pop(@F) =~
s/^(add)ed$/$1/r =~
s/^(delete)d$/$1/r;
print $_, $l for @F;
' file
答案3
也许是一个两阶段的解决方案?
<infile sed 's/deleted/delete/; s/added/add/' |
awk 'NF==3; NF>3 { for (i=1; i<NF; i+=2) print $i, $(i+1), $NF }'
答案4
使用 GNU awk 进行 FPAT:
$ awk -v FPAT='[(][^)]+)|\\S+' '{for (i=1; i<NF; i++) print $i, $NF}' file
(11213068, 2020-11-16) deleted
(1075227404, 2021-06-14) added
(11213177, 2020-11-16) deleted
(1075227413, 2021-06-14) added
(11213070, 2020-11-16) deleted
(1075193958, 2021-05-28) added
(1075194668, 2022-11-29) added
(1073757334, 2021-01-20) added
(1073757337, 2021-01-20) added
(1073757349, 2021-01-20) added
(1073757331, 2021-01-20) added
(1073757346, 2021-01-20) added
(1073757237, 2020-11-20) deleted
(1073757263, 2020-11-20) deleted
(1073757233, 2020-11-20) deleted
(1073757241, 2020-11-20) deleted
(1073757247, 2020-11-20) deleted
或者如果你真的想改变最后的这些话:
$ awk -v FPAT='[(][^)]+)|\\S+' '
BEGIN { map["deleted"]="delete"; map["added"]="add" }
{ for (i=1; i<NF; i++) print $i, map[$NF] }
' file
(11213068, 2020-11-16) delete
(1075227404, 2021-06-14) add
(11213177, 2020-11-16) delete
(1075227413, 2021-06-14) add
(11213070, 2020-11-16) delete
(1075193958, 2021-05-28) add
(1075194668, 2022-11-29) add
(1073757334, 2021-01-20) add
(1073757337, 2021-01-20) add
(1073757349, 2021-01-20) add
(1073757331, 2021-01-20) add
(1073757346, 2021-01-20) add
(1073757237, 2020-11-20) delete
(1073757263, 2020-11-20) delete
(1073757233, 2020-11-20) delete
(1073757241, 2020-11-20) delete
(1073757247, 2020-11-20) delete