忽略时间戳，如何删除非聊天重复项？

Question 1

Ctrl+H
找什么：^\[.+?] (?!<\w+>|\w+ tells you:)(.+)\K\R\[.+?] \1
用。。。来代替：EMPTY
打钩环绕
选择 正则表达式
取消勾选 . matches newline
Replace all

解释：

^                   # beginning of line
    \[.+?]              # time stamp in square brackets followed by a space
    (?!                 # negative lookahead, make sure we haven't after:
        <\w+>               # nickname surrounded by angle brackets 
      |                   # OR
        \w+ tells you:      # nickname followed by " tells you:"
    )                   # end lookahead
    (.+)                # group 1, 1 or more any character but newline, (the text)
    \K                  # forget all we have seen until this position
\R                  # any kind of linebreak
    \[.+?]              # time stamp in square brackets followed by a space
    \1                  # back reference to group 1 (the text)

截图（之前）：

截图（之后）：

Answer

Ctrl+H
找什么：^\[.+?] (?!<\w+>|\w+ tells you:)(.+)\K\R\[.+?] \1
用。。。来代替：EMPTY
打钩环绕
选择 正则表达式
取消勾选 . matches newline
Replace all

解释：

^                   # beginning of line
    \[.+?]              # time stamp in square brackets followed by a space
    (?!                 # negative lookahead, make sure we haven't after:
        <\w+>               # nickname surrounded by angle brackets 
      |                   # OR
        \w+ tells you:      # nickname followed by " tells you:"
    )                   # end lookahead
    (.+)                # group 1, 1 or more any character but newline, (the text)
    \K                  # forget all we have seen until this position
\R                  # any kind of linebreak
    \[.+?]              # time stamp in square brackets followed by a space
    \1                  # back reference to group 1 (the text)

截图（之前）：

截图（之后）：

Question 2

我通常认为，程序化方法比完全基于正则表达式的魔法更容易理解，即使它需要多行：

#!/usr/bin/env python3
import re
import sys

prev = None
for line in sys.stdin:
    line = line.strip()
    if m := re.search(r"^\[\d+:\d+\] (.+)$", line):
        text = m.group(1)
        if re.search(r"^<\S+> ", text):
            print(line)
        elif re.search(r"^\S+ tells you: ", text):
            print(line)
        elif text != prev:
            print(line)
        prev = text
    else:
        print(line)

#!/usr/bin/env perl

my $last = "";
while (my $line = <>) {
    if ($line =~ /^\[\d+:\d+\] (.+)/) {
        my $text = $1;
        if ($text =~ /^<\S+> / || $text =~ /^\S+ tells you:/) {
            print $line;
        } else {
            print $line unless ($text eq $last);
        }
        $last = $text;
    } else {
        print $line;
    }
}

Answer

我通常认为，程序化方法比完全基于正则表达式的魔法更容易理解，即使它需要多行：

#!/usr/bin/env python3
import re
import sys

prev = None
for line in sys.stdin:
    line = line.strip()
    if m := re.search(r"^\[\d+:\d+\] (.+)$", line):
        text = m.group(1)
        if re.search(r"^<\S+> ", text):
            print(line)
        elif re.search(r"^\S+ tells you: ", text):
            print(line)
        elif text != prev:
            print(line)
        prev = text
    else:
        print(line)

#!/usr/bin/env perl

my $last = "";
while (my $line = <>) {
    if ($line =~ /^\[\d+:\d+\] (.+)/) {
        my $text = $1;
        if ($text =~ /^<\S+> / || $text =~ /^\S+ tells you:/) {
            print $line;
        } else {
            print $line unless ($text eq $last);
        }
        $last = $text;
    } else {
        print $line;
    }
}

忽略时间戳，如何删除非聊天重复项？

答案1

答案2

相关内容