我正在使用以下命令 grep 日志文件:
grep "System has completed" my_log.log
并得到类似的东西
2019-12-07 17:03:09.527 System has completed 0 of 15778 files
2019-12-07 17:03:20.936 System has completed 4 of 15778 files
2019-12-07 17:03:32.381 System has completed 5 of 15778 files
2019-12-07 17:03:44.053 System has completed 5 of 15778 files
2019-12-07 17:03:55.753 System has completed 21 of 15778 files
2019-12-07 17:04:07.252 System has completed 22 of 15778 files
2019-12-07 17:04:18.728 System has completed 28 of 15778 files
2019-12-07 17:04:30.181 System has completed 28 of 15778 files
2019-12-07 17:04:41.627 System has completed 28 of 15778 files
我想进一步处理这些结果以剪掉具有重复数量的已完成文件的行,以便输出为
2019-12-07 17:03:09.527 System has completed 0 of 15778 files
2019-12-07 17:03:20.936 System has completed 4 of 15778 files
2019-12-07 17:03:32.381 System has completed 5 of 15778 files
2019-12-07 17:03:55.753 System has completed 21 of 15778 files
2019-12-07 17:04:07.252 System has completed 22 of 15778 files
2019-12-07 17:04:18.728 System has completed 28 of 15778 files
当多行重复相同的数字时,仅保留第一行。由于时间戳的原因,简单地过滤所有唯一的行是不可能的。做这个的最好方式是什么?
答案1
假设数字始终位于相同位置,您可以使用sort
:
grep "System has completed" my_log.log | sort -unk6,6
或者uniq
:
grep "System has completed" my_log.log | uniq -f2