如何过滤掉具有两个相似值的行?

如何过滤掉具有两个相似值的行?

我想要过滤掉具有相同号码->相同号码

来自此文本

    [325194/777232]/var/cache/apt/srcpkgcache.bin:  100%  extents: 5 -> 1   [ OK ]
    [325195/777232]/var/cache/apt/pkgcache.bin: 100%  extents: 4 -> 1   [ OK ]
    [325255/777232]/var/cache/man/de/index.db:  100%  extents: 2 -> 2   [ OK ]
    [325521/777232]/var/log/syslog: 100%  extents: 7 -> 1   [ OK ]
    [325525/777232]/var/log/lastlog:    100%  extents: 2 -> 2   [ OK ]
    [325531/777232]/var/log/syslog.1:   100%  extents: 5 -> 1   [ OK ]
    [325572/777232]/var/log/kern.log:   100%  extents: 6 -> 1   [ OK ]
    [325589/777232]/var/log/auth.log:   100%  extents: 3 -> 1   [ OK ]
    [325621/777232]/var/log/faillog:    100%  extents: 2 -> 2   [ OK ]
    [325625/777232]/var/log/wtmp:   100%  extents: 3 -> 1   [ OK ]
    [325627/777232]/var/log/kern.log.1: 100%  extents: 2 -> 1   [ OK ]
    [325644/777232]/var/log/cups/access_log.1:  100%  extents: 2 -> 1   [ OK ]
    [325810/777232]/var/log/auth.log.1: 100%  extents: 2 -> 1   [ OK ]

答案1

为了得到相同数字 -> 相同数字图案:

grep -E '([[:digit:]]+)[[:blank:]]+->[[:blank:]]+\1[[:blank:]]'
  • -E启用 ERE(扩展正则表达式)

  • ([[:digit:]]+)匹配一个或多个数字并放入捕获组 1

  • [[:blank:]]+匹配一个或多个水平空格

  • ->字面匹配

  • \1指的是第一个捕获的组

  • [[:blank:]]匹配其后的空格

sed您可以在其他流行的文本处理工具/语言(如、、awk)中使用类似的逻辑perl

为了得到没有模式,只添加-v选项:

grep -vE '([[:digit:]]+)[[:blank:]]+->[[:blank:]]+\1[[:blank:]]'

例子:

% cat file.txt
[325194/777232]/var/cache/apt/srcpkgcache.bin:  100%  extents: 5 -> 1   [ OK ]
[325195/777232]/var/cache/apt/pkgcache.bin: 100%  extents: 4 -> 1   [ OK ]
[325255/777232]/var/cache/man/de/index.db:  100%  extents: 2 -> 2   [ OK ]
[325521/777232]/var/log/syslog: 100%  extents: 7 -> 1   [ OK ]
[325525/777232]/var/log/lastlog:    100%  extents: 2 -> 2   [ OK ]
[325531/777232]/var/log/syslog.1:   100%  extents: 5 -> 1   [ OK ]
[325572/777232]/var/log/kern.log:   100%  extents: 6 -> 1   [ OK ]
[325589/777232]/var/log/auth.log:   100%  extents: 3 -> 1   [ OK ]
[325621/777232]/var/log/faillog:    100%  extents: 2 -> 2   [ OK ]
[325625/777232]/var/log/wtmp:   100%  extents: 3 -> 1   [ OK ]
[325627/777232]/var/log/kern.log.1: 100%  extents: 2 -> 1   [ OK ]
[325644/777232]/var/log/cups/access_log.1:  100%  extents: 2 -> 1   [ OK ]
[325810/777232]/var/log/auth.log.1: 100%  extents: 2 -> 1   [ OK ]

% grep -E '([[:digit:]]+)[[:blank:]]+->[[:blank:]]+\1[[:blank:]]' file.txt
[325255/777232]/var/cache/man/de/index.db:  100%  extents: 2 -> 2   [ OK ]
[325525/777232]/var/log/lastlog:    100%  extents: 2 -> 2   [ OK ]
[325621/777232]/var/log/faillog:    100%  extents: 2 -> 2   [ OK ]

% grep -vE '([[:digit:]]+)[[:blank:]]+->[[:blank:]]+\1[[:blank:]]' file.txt
[325194/777232]/var/cache/apt/srcpkgcache.bin:  100%  extents: 5 -> 1   [ OK ]
[325195/777232]/var/cache/apt/pkgcache.bin: 100%  extents: 4 -> 1   [ OK ]
[325521/777232]/var/log/syslog: 100%  extents: 7 -> 1   [ OK ]
[325531/777232]/var/log/syslog.1:   100%  extents: 5 -> 1   [ OK ]
[325572/777232]/var/log/kern.log:   100%  extents: 6 -> 1   [ OK ]
[325589/777232]/var/log/auth.log:   100%  extents: 3 -> 1   [ OK ]
[325625/777232]/var/log/wtmp:   100%  extents: 3 -> 1   [ OK ]
[325627/777232]/var/log/kern.log.1: 100%  extents: 2 -> 1   [ OK ]
[325644/777232]/var/log/cups/access_log.1:  100%  extents: 2 -> 1   [ OK ]
[325810/777232]/var/log/auth.log.1: 100%  extents: 2 -> 1   [ OK ]

答案2

您可以使用 GNU Awk ( gawk) 来执行此操作。

假设输入存储在MY_FILE,你只想看到包含相同的数字,它可能看起来像这样:

gawk '{match($0,/([[:digit:]]+)\s*->\s*([[:digit:]])+/,M);if(M[1]==M[2])print$0}' MY_FILE

如果你想删除数字相同的行,只显示那些数字相同的行不同的数字,只需将 替换==!=

gawk '{match($0,/([[:digit:]]+)\s*->\s*([[:digit:]])+/,M);if(M[1]!=M[2])print$0}' MY_FILE

解释:

gawk将运行每行花括号内的指令,这些指令包括:

match($0, /([[:digit:]]+)\s*->\s*([[:digit:]])+/, M) ; 
if(M[1] == M[2]) print$0

这意味着,将正则表达式([[:digit:]]+)\s*->\s*([[:digit:]])+与整行($0)匹配,并将匹配对象/数组存储在变量中M

然后比较匹配组 1 和 2 的内容(分别是箭头前后的数字),如果它们相等(如果使用==)或不同(如果使用!=),则打印整行。

答案3

如果您的数据结构为分隔字段,那么您实际上并不需要正则表达式匹配。

在您的情况下,->总是出现为第 5 个空格分隔的字段,因此测试第 4 个和第 6 个的值就足够了:

awk '$6 != $4' file

如果的位置->发生变化,那么你可以做类似的事情

awk '{for(i=1;i<NF;i++) if ($i == "->" && $(i-1) != $(i+1)) {print; break}}' file

或者先分割行->,然后根据空格将各部分分割开,并根据第二部分的第一个字段测试第一部分的最后一个字段:

awk -F' -> ' '{
  n=split($1,a,/[ \t]+/); split($2,b,/[ \t]+/); if(b[1] != a[n]) print
}' file

相关内容