我想对 find 命令的输出进行唯一排序,在任何目录中都没有重复的文件名。
find /path/to/first_directory/* /path/to/second_directory/* /path/to/third_directory/* -mtime -1 -name "filename_pattern*"
示例输出:
/path/to/first_directory/sample_file1_2017Dec25.dat
/path/to/first_directory/sample_file2_2017Nov01.dat
/path/to/first_directory/sample_file3_2017Oct08.dat
/path/to/first_directory/archive/sample_file1_2017Dec25.dat.Z
/path/to/first_directory/archive/sample_file2_2017Nov01.dat.Z
/path/to/second_directory/sample_file4_2017Sep11.dat
/path/to/second_directory/sample_file5_2017Oct05.dat
/path/to/third_directory/sample_file1_2017Dec25.dat
/path/to/third_directory/sample_file2_2017Nov01.dat
/path/to/third_directory/sample_file3_2017Oct08.dat
/path/to/third_directory/sample_file4_2017Sep11.dat
/path/to/third_directory/sample_file5_2017Oct05.dat
/path/to/third_directory/sample_file6_2017July04.dat
/path/to/third_directory/sample_file6_2017June12.dat
/path/to/third_directory/sample_file7_2017May01.dat
从输出中您可以看到/first_directory/
和中存在重复的文件名/first_directory/archive/
,并且来自 和 的所有文件/first_directory/*
也在/second_directory/*
其中/third_directory/*
。这意味着这是在和/third_directory/*
中找到的所有文件的存档目录,但也有只能在(检查和)中找到的文件/first_directory/*
/second_directory/*
/third_directory/*
sample_file6
sample_file7
我想要打印的只是按此顺序从/first_directory/
到/first_directory/archive/
到/second_directory/
的文件/third_directory/
,没有重复,并且还按日期排序。
期望的输出:
/path/to/first_directory/sample_file1_2017Dec25.dat
/path/to/first_directory/sample_file2_2017Nov01.dat
/path/to/first_directory/sample_file3_2017Oct08.dat
/path/to/second_directory/sample_file4_2017Sep11.dat
/path/to/second_directory/sample_file5_2017Oct05.dat
/path/to/third_directory/sample_file6_2017July04.dat
/path/to/third_directory/sample_file6_2017June12.dat
/path/to/third_directory/sample_file7_2017May01.dat
答案1
如果 find 命令的输出保存在名为 的文件中filelist
,则尝试:
$ awk -F/ '{f=$NF; sub(/\.Z$/,"",f)} !a[f]++' filelist
/path/to/first_directory/sample_file1_2017Dec25.dat
/path/to/first_directory/sample_file2_2017Nov01.dat
/path/to/first_directory/sample_file3_2017Oct08.dat
/path/to/second_directory/sample_file4_2017Sep11.dat
/path/to/second_directory/sample_file5_2017Oct05.dat
/path/to/third_directory/sample_file6_2017July04.dat
/path/to/third_directory/sample_file6_2017June12.dat
/path/to/third_directory/sample_file7_2017May01.dat
如果您想在不创建文件的情况下执行相同的操作:
find /path/to/first_directory/* /path/to/second_directory/* /path/to/third_directory/* -mtime -1 -name "filename_pattern*" | awk -F/ '{f=$NF; sub(/\.Z$/,"",f)} !a[f]++'
或者,您更喜欢将命令分散到多行中,请使用:
find /path/to/first_directory/* /path/to/second_directory/* \
/path/to/third_directory/* -mtime -1 -name "filename_pattern*" |
awk -F/ '{f=$NF; sub(/\.Z$/,"",f)} !a[f]++'
我们添加\
到第一行的末尾,因为这是 bash 的行继续字符。因为第二行以 结尾|
,所以不需要行继续符。
怎么运行的
find
首先,重要的是要按照优先级顺序在命令中列出目录。我看到你已经这么做了。
-F/
这告诉 awk 用作
/
字段分隔符。这意味着文件名将是最后一个字段$NF
.f=$NF; sub(/\.Z$/,"",f)
这会将文件名分配给变量
f
,然后删除最终的文件名.Z
(f
如果存在)。!a[f]++'
如果
f
以前没有见过,请打印此行。
更新1:删除其他扩展
根据评论,.Z
这并不是唯一需要删除的扩展。可能还有其他扩展.dat.edi
,.dat.bak
应该简单地替换为.dat
.在这种情况下:
awk -F/ '{f=$NF; sub(/\.dat.*/,".dat",f)} !a[f]++' filelist
更新 2:显示按时间戳排序的文件:
awk -F/ '{f=$NF; sub(/\.dat.*/,".dat",f)} !a[f]++' filelist | xargs -d'\n' -r ls -t