剪掉部分重复的线条

剪掉部分重复的线条

我正在使用以下命令 grep 日志文件:

grep "System has completed" my_log.log

并得到类似的东西

2019-12-07 17:03:09.527   System has completed 0 of 15778 files
2019-12-07 17:03:20.936   System has completed 4 of 15778 files
2019-12-07 17:03:32.381   System has completed 5 of 15778 files
2019-12-07 17:03:44.053   System has completed 5 of 15778 files
2019-12-07 17:03:55.753   System has completed 21 of 15778 files
2019-12-07 17:04:07.252   System has completed 22 of 15778 files
2019-12-07 17:04:18.728   System has completed 28 of 15778 files
2019-12-07 17:04:30.181   System has completed 28 of 15778 files
2019-12-07 17:04:41.627   System has completed 28 of 15778 files

我想进一步处理这些结果以剪掉具有重复数量的已完成文件的行,以便输出为

2019-12-07 17:03:09.527   System has completed 0 of 15778 files
2019-12-07 17:03:20.936   System has completed 4 of 15778 files
2019-12-07 17:03:32.381   System has completed 5 of 15778 files
2019-12-07 17:03:55.753   System has completed 21 of 15778 files
2019-12-07 17:04:07.252   System has completed 22 of 15778 files
2019-12-07 17:04:18.728   System has completed 28 of 15778 files

当多行重复相同的数字时,仅保留第一行。由于时间戳的原因,简单地过滤所有唯一的行是不可能的。做这个的最好方式是什么?

答案1

假设数字始终位于相同位置,您可以使用sort

grep "System has completed" my_log.log | sort -unk6,6

或者uniq

grep "System has completed" my_log.log | uniq -f2

相关内容