忽略时间戳,如何删除以下非聊天重复项?聊天有两种格式
- 以尖括号括起来的昵称开头,
- 以昵称开头,后面跟着“告诉你:”。
我更希望它是在 Notepad++ 中完成的,但借助 Cygwin,我也可以使用多个实用程序。
原来的
[16:29] You see a sheep; it looks like it weighs about 98.
[16:30] You see a sheep; it looks like it weighs about 100.
[16:52] anonymized tells you: Do you know the bank yet?
[17:11] Only anonymized may access the corpse for now.
[17:12] Only anonymized may access the corpse for now.
[17:14] <anonymized> You can do it later.
[17:14] <anonymized> The dagger for example
[17:15] <anonymized> The dagger for example
[17:15] <dynv> hi
[17:32] gnome has been killed by anonymized and dynv
[17:32] The corpse is too far away.
[17:32] The corpse is too far away.
[17:33] anonymized: now is gets dangerous
期望结果
[16:29] You see a sheep; it looks like it weighs about 98.
[16:30] You see a sheep; it looks like it weighs about 100.
[16:52] anonymized tells you: Do you know the bank yet?
[17:11] Only anonymized may access the corpse for now.
[17:14] <anonymized> You can do it later.
[17:14] <anonymized> The dagger for example
[17:15] <anonymized> The dagger for example
[17:15] <dynv> hi
[17:32] gnome has been killed by anonymized and dynv
[17:32] The corpse is too far away.
[17:33] anonymized: now is gets dangerous
非常感谢你
答案1
- Ctrl+H
- 找什么:
^\[.+?] (?!<\w+>|\w+ tells you:)(.+)\K\R\[.+?] \1
- 用。。。来代替:
EMPTY
- 打钩 环绕
- 选择 正则表达式
- 取消勾选
. matches newline
- Replace all
解释:
^ # beginning of line
\[.+?] # time stamp in square brackets followed by a space
(?! # negative lookahead, make sure we haven't after:
<\w+> # nickname surrounded by angle brackets
| # OR
\w+ tells you: # nickname followed by " tells you:"
) # end lookahead
(.+) # group 1, 1 or more any character but newline, (the text)
\K # forget all we have seen until this position
\R # any kind of linebreak
\[.+?] # time stamp in square brackets followed by a space
\1 # back reference to group 1 (the text)
截图(之前):
截图(之后):
答案2
我通常认为,程序化方法比完全基于正则表达式的魔法更容易理解,即使它需要多行:
#!/usr/bin/env python3
import re
import sys
prev = None
for line in sys.stdin:
line = line.strip()
if m := re.search(r"^\[\d+:\d+\] (.+)$", line):
text = m.group(1)
if re.search(r"^<\S+> ", text):
print(line)
elif re.search(r"^\S+ tells you: ", text):
print(line)
elif text != prev:
print(line)
prev = text
else:
print(line)
#!/usr/bin/env perl
my $last = "";
while (my $line = <>) {
if ($line =~ /^\[\d+:\d+\] (.+)/) {
my $text = $1;
if ($text =~ /^<\S+> / || $text =~ /^\S+ tells you:/) {
print $line;
} else {
print $line unless ($text eq $last);
}
$last = $text;
} else {
print $line;
}
}