匹配模式、添加换行符并将单词附加到行尾

匹配模式、添加换行符并将单词附加到行尾

有人可以帮助我解决这个问题吗?我有一个已提取的文件,文件内容如下所示。

(11213068, 2020-11-16) deleted
(1075227404, 2021-06-14) added
(11213177, 2020-11-16) deleted
(1075227413, 2021-06-14) added
(11213070, 2020-11-16) deleted
(1075193958, 2021-05-28) added
(1075194668, 2022-11-29) added
(1073757334, 2021-01-20) (1073757337, 2021-01-20) (1073757349, 2021-01-20) (1073757331, 2021-01-20) (1073757346, 2021-01-20) added
(1073757237, 2020-11-20) (1073757263, 2020-11-20) (1073757233, 2020-11-20) (1073757241, 2020-11-20) (1073757247, 2020-11-20) deleted

++ 我想要的文件结果是这样的 --

(11213068, 2020-11-16) delete
(1075227404, 2021-06-14) add
(11213177, 2020-11-16) delete
(1075227413, 2021-06-14) add
(11213070, 2020-11-16) delete
(1075193958, 2021-05-28) add
(1075194668, 2022-11-29) add
(1073757334, 2021-01-20) add
(1073757337, 2021-01-20) add
(1073757349, 2021-01-20) add
(1073757331, 2021-01-20) add
(1073757346, 2021-01-20) add
(1073757237, 2020-11-20) delete
(1073757263, 2020-11-20) delete
(1073757233, 2020-11-20) delete
(1073757241, 2020-11-20) delete
(1073757247, 2020-11-20) delete

在最后两行我无法找到解决方案。我使用这个命令的输出——

awk '$3!="added"' | awk '$3!="deleted"' | sed 's/) (/\n/g' file.txt

(11213068, 2020-11-16) deleted
(1075227404, 2021-06-14) added
(11213177, 2020-11-16) deleted
(1075227413, 2021-06-14) added
(11213070, 2020-11-16) deleted
(1075193958, 2021-05-28) added
(1075194668, 2022-11-29) added
(1073757334, 2021-01-20
1073757337, 2021-01-20
1073757349, 2021-01-20
1073757331, 2021-01-20
1073757346, 2021-01-20) added
(1073757237, 2020-11-20
1073757263, 2020-11-20
1073757233, 2020-11-20
1073757241, 2020-11-20
1073757247, 2020-11-20) deleted

感谢您的时间。

答案1

为此使用正确的分隔符。

awk -F') ' '{for (i=1;i<NF;i++) print $i FS $NF}' file

如果需要替换最后一个字段,有多种方法可以实现,例如sub()在行处理的开头使用。

awk -F') ' '{sub(/added$/,"add"); sub(/deleted$/,"delete"); for (i=1;i<NF;i++) print $i FS $NF}' file

答案2

GNU sed具有广泛的正则表达式模式-E

  • )
    ( 用换行标记标记夹在中间的区域。然后将最后一个字段(在其过去时态清理之后)传输到第一个标记中,打印到第一个标记,然后截断到第一个标记。重复这个过程,直到耗尽模式空间。

$ sed -Ee '/\n/ba
    /e?d$/s/ (add|delete)e?d$/ \1/
    s/[)] [(]/) \n(/g;:a
    s/(\n.*)?\n.* (\S+)$/\2&/
    /\n.*\n/{P;D;}
' file

$ perl -F'\)\s' -lane '$, = ") ";
    my $l = pop(@F) =~
     s/^(add)ed$/$1/r =~
      s/^(delete)d$/$1/r;
    print $_, $l for @F;
' file

答案3

也许是一个两阶段的解决方案?

<infile sed 's/deleted/delete/; s/added/add/' | 
awk 'NF==3; NF>3 { for (i=1; i<NF; i+=2) print $i, $(i+1), $NF }'

答案4

使用 GNU awk 进行 FPAT:

$ awk -v FPAT='[(][^)]+)|\\S+' '{for (i=1; i<NF; i++) print $i, $NF}' file
(11213068, 2020-11-16) deleted
(1075227404, 2021-06-14) added
(11213177, 2020-11-16) deleted
(1075227413, 2021-06-14) added
(11213070, 2020-11-16) deleted
(1075193958, 2021-05-28) added
(1075194668, 2022-11-29) added
(1073757334, 2021-01-20) added
(1073757337, 2021-01-20) added
(1073757349, 2021-01-20) added
(1073757331, 2021-01-20) added
(1073757346, 2021-01-20) added
(1073757237, 2020-11-20) deleted
(1073757263, 2020-11-20) deleted
(1073757233, 2020-11-20) deleted
(1073757241, 2020-11-20) deleted
(1073757247, 2020-11-20) deleted

或者如果你真的想改变最后的这些话:

$ awk -v FPAT='[(][^)]+)|\\S+' '
    BEGIN { map["deleted"]="delete"; map["added"]="add" }
    { for (i=1; i<NF; i++) print $i, map[$NF] }
' file
(11213068, 2020-11-16) delete
(1075227404, 2021-06-14) add
(11213177, 2020-11-16) delete
(1075227413, 2021-06-14) add
(11213070, 2020-11-16) delete
(1075193958, 2021-05-28) add
(1075194668, 2022-11-29) add
(1073757334, 2021-01-20) add
(1073757337, 2021-01-20) add
(1073757349, 2021-01-20) add
(1073757331, 2021-01-20) add
(1073757346, 2021-01-20) add
(1073757237, 2020-11-20) delete
(1073757263, 2020-11-20) delete
(1073757233, 2020-11-20) delete
(1073757241, 2020-11-20) delete
(1073757247, 2020-11-20) delete

相关内容