你好,我尝试使用 notepad++ 中的 TextFX 插件删除文本文件中的重复项,但它对这种类型的文本不起作用
209.116.247.120|admin|default|Taiwan (TW)|Tai-pei|Taipei|Unknown
209.116.247.120|admin|default|
209.116.49.130|admin|admin
209.116.49.130|admin|admin|China (CN)|Henan|Zhengzhou|Unknown
209.116.55.142|admin|admin
209.116.55.142|admin|admin|Korea, Republic of (KR)|Seoul-t'ukpyolsi|Seoul|Unknown
209.116.65.26|admin|admin
209.116.65.26|admin|admin|New Zealand (NZ)|Unknown|Unknown|Unknown
如您所见,添加的国家/地区存在重复项,因此我想删除这些重复项
209.116.247.120|admin|default|
209.116.49.130|admin|admin
209.116.55.142|admin|admin
209.116.65.26|admin|admin
或者这些重复项
209.116.247.120|admin|default|Taiwan (TW)|Tai-pei|Taipei|Unknown
209.116.49.130|admin|admin|China (CN)|Henan|Zhengzhou|Unknown
209.116.55.142|admin|admin|Korea, Republic of (KR)|Seoul-t'ukpyolsi|Seoul|Unknown
209.116.65.26|admin|admin|New Zealand (NZ)|Unknown|Unknown|Unknown
如果有人有任何想法或正则表达式命令来解决这个问题,我将不胜感激并提供命令,谢谢。
答案1
仅当重复项是连续的:
- Ctrl+H
- 找什么:
^(([^|]+[|][^|]+[|][^|]+)[|]?.*)\R\2
- 用。。。来代替:
$1
- Replace all
解释:
^ : begining of line
( : start group 1
( : start group 2
[^|]+ : 1 or more NON pipe character |
[|] : a pipe
[^|]+ : 1 or more NON pipe character |
[|] : a pipe
[^|]+ : 1 or more NON pipe character |
) : end group 2
[|]? : a pipe, optional
.* : 0 or more any character but newline
) : end group 1
\R : any kind of line break
\2 : backreference to group 2
- 请勿检查
. matches newline
替代品:
$1 : content of group, the first dupplicate line
给定示例的结果:
209.116.247.120|admin|default|Taiwan (TW)|Tai-pei|Taipei|Unknown|
209.116.49.130|admin|admin|China (CN)|Henan|Zhengzhou|Unknown
209.116.55.142|admin|admin|Korea, Republic of (KR)|Seoul-t'ukpyolsi|Seoul|Unknown
209.116.65.26|admin|admin|New Zealand (NZ)|Unknown|Unknown|Unknown