忽略时间戳,如何删除非聊天重复项?

忽略时间戳,如何删除非聊天重复项?

忽略时间戳,如何删除以下非聊天重复项?聊天有两种格式

  1. 以尖括号括起来的昵称开头,
  2. 以昵称开头,后面跟着“告诉你:”。

我更希望它是在 Notepad++ 中完成的,但借助 Cygwin,我也可以使用多个实用程序。

原来的

[16:29] You see a sheep; it looks like it weighs about 98.
[16:30] You see a sheep; it looks like it weighs about 100.
[16:52] anonymized tells you: Do you know the bank yet?
[17:11] Only anonymized may access the corpse for now.
[17:12] Only anonymized may access the corpse for now.
[17:14] <anonymized> You can do it later.
[17:14] <anonymized> The dagger for example
[17:15] <anonymized> The dagger for example
[17:15] <dynv> hi
[17:32] gnome has been killed by anonymized and dynv
[17:32] The corpse is too far away.
[17:32] The corpse is too far away.
[17:33] anonymized: now is gets dangerous

期望结果

[16:29] You see a sheep; it looks like it weighs about 98.
[16:30] You see a sheep; it looks like it weighs about 100.
[16:52] anonymized tells you: Do you know the bank yet?
[17:11] Only anonymized may access the corpse for now.
[17:14] <anonymized> You can do it later.
[17:14] <anonymized> The dagger for example
[17:15] <anonymized> The dagger for example
[17:15] <dynv> hi
[17:32] gnome has been killed by anonymized and dynv
[17:32] The corpse is too far away.
[17:33] anonymized: now is gets dangerous

非常感谢你

答案1

  • Ctrl+H
  • 找什么:^\[.+?] (?!<\w+>|\w+ tells you:)(.+)\K\R\[.+?] \1
  • 用。。。来代替:EMPTY
  • 打钩 环绕
  • 选择 正则表达式
  • 取消勾选 . matches newline
  • Replace all

解释:

^                   # beginning of line
    \[.+?]              # time stamp in square brackets followed by a space
    (?!                 # negative lookahead, make sure we haven't after:
        <\w+>               # nickname surrounded by angle brackets 
      |                   # OR
        \w+ tells you:      # nickname followed by " tells you:"
    )                   # end lookahead
    (.+)                # group 1, 1 or more any character but newline, (the text)
    \K                  # forget all we have seen until this position
\R                  # any kind of linebreak
    \[.+?]              # time stamp in square brackets followed by a space
    \1                  # back reference to group 1 (the text)

截图(之前):

在此处输入图片描述

截图(之后):

在此处输入图片描述

答案2

我通常认为,程序化方法比完全基于正则表达式的魔法更容易理解,即使它需要多行:

#!/usr/bin/env python3
import re
import sys

prev = None
for line in sys.stdin:
    line = line.strip()
    if m := re.search(r"^\[\d+:\d+\] (.+)$", line):
        text = m.group(1)
        if re.search(r"^<\S+> ", text):
            print(line)
        elif re.search(r"^\S+ tells you: ", text):
            print(line)
        elif text != prev:
            print(line)
        prev = text
    else:
        print(line)
#!/usr/bin/env perl

my $last = "";
while (my $line = <>) {
    if ($line =~ /^\[\d+:\d+\] (.+)/) {
        my $text = $1;
        if ($text =~ /^<\S+> / || $text =~ /^\S+ tells you:/) {
            print $line;
        } else {
            print $line unless ($text eq $last);
        }
        $last = $text;
    } else {
        print $line;
    }
}

相关内容