我想要过滤掉具有相同号码->相同号码
来自此文本
[325194/777232]/var/cache/apt/srcpkgcache.bin: 100% extents: 5 -> 1 [ OK ]
[325195/777232]/var/cache/apt/pkgcache.bin: 100% extents: 4 -> 1 [ OK ]
[325255/777232]/var/cache/man/de/index.db: 100% extents: 2 -> 2 [ OK ]
[325521/777232]/var/log/syslog: 100% extents: 7 -> 1 [ OK ]
[325525/777232]/var/log/lastlog: 100% extents: 2 -> 2 [ OK ]
[325531/777232]/var/log/syslog.1: 100% extents: 5 -> 1 [ OK ]
[325572/777232]/var/log/kern.log: 100% extents: 6 -> 1 [ OK ]
[325589/777232]/var/log/auth.log: 100% extents: 3 -> 1 [ OK ]
[325621/777232]/var/log/faillog: 100% extents: 2 -> 2 [ OK ]
[325625/777232]/var/log/wtmp: 100% extents: 3 -> 1 [ OK ]
[325627/777232]/var/log/kern.log.1: 100% extents: 2 -> 1 [ OK ]
[325644/777232]/var/log/cups/access_log.1: 100% extents: 2 -> 1 [ OK ]
[325810/777232]/var/log/auth.log.1: 100% extents: 2 -> 1 [ OK ]
答案1
为了得到有这相同数字 -> 相同数字图案:
grep -E '([[:digit:]]+)[[:blank:]]+->[[:blank:]]+\1[[:blank:]]'
-E
启用 ERE(扩展正则表达式)([[:digit:]]+)
匹配一个或多个数字并放入捕获组 1[[:blank:]]+
匹配一个或多个水平空格->
字面匹配\1
指的是第一个捕获的组[[:blank:]]
匹配其后的空格
sed
您可以在其他流行的文本处理工具/语言(如、、awk
)中使用类似的逻辑perl
。
为了得到没有模式,只添加-v
选项:
grep -vE '([[:digit:]]+)[[:blank:]]+->[[:blank:]]+\1[[:blank:]]'
例子:
% cat file.txt
[325194/777232]/var/cache/apt/srcpkgcache.bin: 100% extents: 5 -> 1 [ OK ]
[325195/777232]/var/cache/apt/pkgcache.bin: 100% extents: 4 -> 1 [ OK ]
[325255/777232]/var/cache/man/de/index.db: 100% extents: 2 -> 2 [ OK ]
[325521/777232]/var/log/syslog: 100% extents: 7 -> 1 [ OK ]
[325525/777232]/var/log/lastlog: 100% extents: 2 -> 2 [ OK ]
[325531/777232]/var/log/syslog.1: 100% extents: 5 -> 1 [ OK ]
[325572/777232]/var/log/kern.log: 100% extents: 6 -> 1 [ OK ]
[325589/777232]/var/log/auth.log: 100% extents: 3 -> 1 [ OK ]
[325621/777232]/var/log/faillog: 100% extents: 2 -> 2 [ OK ]
[325625/777232]/var/log/wtmp: 100% extents: 3 -> 1 [ OK ]
[325627/777232]/var/log/kern.log.1: 100% extents: 2 -> 1 [ OK ]
[325644/777232]/var/log/cups/access_log.1: 100% extents: 2 -> 1 [ OK ]
[325810/777232]/var/log/auth.log.1: 100% extents: 2 -> 1 [ OK ]
% grep -E '([[:digit:]]+)[[:blank:]]+->[[:blank:]]+\1[[:blank:]]' file.txt
[325255/777232]/var/cache/man/de/index.db: 100% extents: 2 -> 2 [ OK ]
[325525/777232]/var/log/lastlog: 100% extents: 2 -> 2 [ OK ]
[325621/777232]/var/log/faillog: 100% extents: 2 -> 2 [ OK ]
% grep -vE '([[:digit:]]+)[[:blank:]]+->[[:blank:]]+\1[[:blank:]]' file.txt
[325194/777232]/var/cache/apt/srcpkgcache.bin: 100% extents: 5 -> 1 [ OK ]
[325195/777232]/var/cache/apt/pkgcache.bin: 100% extents: 4 -> 1 [ OK ]
[325521/777232]/var/log/syslog: 100% extents: 7 -> 1 [ OK ]
[325531/777232]/var/log/syslog.1: 100% extents: 5 -> 1 [ OK ]
[325572/777232]/var/log/kern.log: 100% extents: 6 -> 1 [ OK ]
[325589/777232]/var/log/auth.log: 100% extents: 3 -> 1 [ OK ]
[325625/777232]/var/log/wtmp: 100% extents: 3 -> 1 [ OK ]
[325627/777232]/var/log/kern.log.1: 100% extents: 2 -> 1 [ OK ]
[325644/777232]/var/log/cups/access_log.1: 100% extents: 2 -> 1 [ OK ]
[325810/777232]/var/log/auth.log.1: 100% extents: 2 -> 1 [ OK ]
答案2
您可以使用 GNU Awk ( gawk
) 来执行此操作。
假设输入存储在MY_FILE
,你只想看到包含相同的数字,它可能看起来像这样:
gawk '{match($0,/([[:digit:]]+)\s*->\s*([[:digit:]])+/,M);if(M[1]==M[2])print$0}' MY_FILE
如果你想删除数字相同的行,只显示那些数字相同的行不同的数字,只需将 替换==
为!=
:
gawk '{match($0,/([[:digit:]]+)\s*->\s*([[:digit:]])+/,M);if(M[1]!=M[2])print$0}' MY_FILE
解释:
gawk
将运行每行花括号内的指令,这些指令包括:
match($0, /([[:digit:]]+)\s*->\s*([[:digit:]])+/, M) ;
if(M[1] == M[2]) print$0
这意味着,将正则表达式([[:digit:]]+)\s*->\s*([[:digit:]])+
与整行($0
)匹配,并将匹配对象/数组存储在变量中M
。
然后比较匹配组 1 和 2 的内容(分别是箭头前后的数字),如果它们相等(如果使用==
)或不同(如果使用!=
),则打印整行。
答案3
如果您的数据结构为分隔字段,那么您实际上并不需要正则表达式匹配。
在您的情况下,->
总是出现为第 5 个空格分隔的字段,因此测试第 4 个和第 6 个的值就足够了:
awk '$6 != $4' file
如果的位置->
发生变化,那么你可以做类似的事情
awk '{for(i=1;i<NF;i++) if ($i == "->" && $(i-1) != $(i+1)) {print; break}}' file
或者先分割行->
,然后根据空格将各部分分割开,并根据第二部分的第一个字段测试第一部分的最后一个字段:
awk -F' -> ' '{
n=split($1,a,/[ \t]+/); split($2,b,/[ \t]+/); if(b[1] != a[n]) print
}' file