我想对以下文件名/路径列表进行排序。
L1_Data/level1/192027/LC08_L1TP_192027_20201126_20210316_01_T1 DONE
L1_Data/level1/192028/LC08_L1TP_192028_20201126_20210316_01_T1 DONE
L1_Data/level1/192029/LC08_L1TP_192029_20201126_20210316_01_T1 DONE
L1_Data/level1/191027/LE07_L1TP_191027_20201127_20201223_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201127_20201223_01_T1 DONE
L1_Data/level1/192027/LC08_L1TP_192027_20201212_20210313_01_T1 QUEUED
L1_Data/level1/191028/LE07_L1TP_191028_20201213_20210108_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201213_20210108_01_T1 DONE
L1_Data/level1/191027/LC08_L1TP_191027_20201221_20210310_01_T1 DONE
L1_Data/level1/T32TQS/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQS_20200101T110654.SAFE DONE
L1_Data/level1/T32TQR/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQR_20200101T110654.SAFE QUEUED
L1_Data/level1/T33TUL/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUL_20200101T110654.SAFE DONE
L1_Data/level1/T33TUM/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUM_20200101T110654.SAFE DONE
L1_Data/level1/T32TQS/S2A_MSIL1C_20200102T102421_N0208_R065_T32TQS_20200102T105534.SAFE DONE
L1_Data/level1/T33TUL/S2B_MSIL1C_20200104T101319_N0208_R022_T33TUL_20200104T121239.SAFE DONE
L1_Data/level1/T32TQR/S2B_MSIL1C_20200104T101319_N0208_R022_T32TQR_20200104T121239.SAFE QUEUED
L1_Data/level1/T32TQS/S2A_MSIL1C_20200106T100401_N0208_R122_T32TQS_20200106T103423.SAFE DONE
每行包含一个文件名(包括路径)及其工作状态(已排队/完成)。每个文件名包含卫星图像数据的信息,如卫星类型、记录日期、足迹等。
现在,我想根据以下优先级对列表进行重新排序:
- 工作状态-->已排队首先。作为一个步骤,这对我来说不是问题,但后续步骤的解决方案包括它们的组合(您将在下一张图片之后找到对我的问题的更详细的描述):
- 卫星类型(S2A=Sentinel A;S2B=Sentinel B;LC08=Landsat 8;LE07=Landsat 7)-->S2A/B开头(无论A还是B),然后是LC08,然后是LE07。换句话说:我想区分 Sentinel 2、Landsat 8 和 Landsat 7,但不是Sentinel 2A 和 Sentinel 2B 之间。
- 记录日期,升序
- 足迹,上升
下图显示了相应子字符串的位置,后面是我的问题的描述。
除了只有非常基本的知识之外种类命令,我的具体问题是:
- a) 正确寻址子串,在
- b) 两种不同的文件名类型(/约定),
- c) 下划线不能用作分隔符,因为在 Sentinel 文件名中有五个下划线,在 Landsat 文件名中有六个下划线,除此之外,两者之间的子字符串序列不同。
- d) 命令S2A/B前LC08前LE07不幸的是不是按照字母表排列的,并且
- e) 解决S2A和S2B卫星作为一个整体。这当然可以通过仅解决S2,但是,由于仅由两个字符组成,因此存在与整个文件名字符串的其他部分混淆的一定风险(实际上该列表要长得多并且会不时更新,因此可能包含“false”S2s 在其他或未来的行中)。
最后,重新排序的列表应如下所示:
L1_Data/level1/T32TQR/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQR_20200101T110654.SAFE QUEUED
L1_Data/level1/T32TQR/S2B_MSIL1C_20200104T101319_N0208_R022_T32TQR_20200104T121239.SAFE QUEUED
L1_Data/level1/192027/LC08_L1TP_192027_20201212_20210313_01_T1 QUEUED
L1_Data/level1/T32TQS/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQS_20200101T110654.SAFE DONE
L1_Data/level1/T33TUL/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUL_20200101T110654.SAFE DONE
L1_Data/level1/T33TUM/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUM_20200101T110654.SAFE DONE
L1_Data/level1/T32TQS/S2A_MSIL1C_20200102T102421_N0208_R065_T32TQS_20200102T105534.SAFE DONE
L1_Data/level1/T33TUL/S2B_MSIL1C_20200104T101319_N0208_R022_T33TUL_20200104T121239.SAFE DONE
L1_Data/level1/T32TQS/S2A_MSIL1C_20200106T100401_N0208_R122_T32TQS_20200106T103423.SAFE DONE
L1_Data/level1/192027/LC08_L1TP_192027_20201126_20210316_01_T1 DONE
L1_Data/level1/192028/LC08_L1TP_192028_20201126_20210316_01_T1 DONE
L1_Data/level1/192029/LC08_L1TP_192029_20201126_20210316_01_T1 DONE
L1_Data/level1/191028/LE07_L1TP_191028_20201213_20210108_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201213_20210108_01_T1 DONE
L1_Data/level1/191027/LE07_L1TP_191027_20201127_20201223_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201127_20201223_01_T1 DONE
有人可以帮我吗?
答案1
问题是排序字段不在行中的同一列中。
我在这里使用 perl 以获得最大的灵活性:这是“custom_sort.pl”
#! perl
while (<>) {
# capture the fields of an "L" satellite
if (/.*\/(L...)_.*?_(\d+)_(\d+)\S+\s+(.*)/) {
push @data, [$_, $4, $1, $3, $2]
}
# capture the fields of an "S" satellite
elsif (/.*\/(S..)_.*?_(\d{8}).*?_.*?_.*?_(.*?)_\S+\s+(.*)/) {
push @data, [$_, $4, $1, $2, $3]
}
}
sub mysort {
-($a->[1] cmp $b->[1]) # work status, descending
|| cmp_satellite($a->[2], $b->[2]) # satellite
|| $a->[3] <=> $b->[3] # record date
|| $a->[4] cmp $b->[4] # footprint
}
sub cmp_satellite {
my ($a, $b) = @_;
return -1 if $a =~ /^S/;
return +1 if $b =~ /^S/;
$a cmp $b
}
print $_->[0] for sort mysort @data
运行它
perl custom_sort.pl file
答案2
使用awk
,sort
和cut
:
awk -F'[/ ]' -v OFS='\t' '
{
status=$NF # this is the last field
split($(NF-1), parts, "_") # split filename into array `parts`
if (parts[1]=="S2A" || parts[1]=="S2B") type=1
else if (parts[1]=="LC08"){ type=2 }
else if (parts[1]=="LE07"){ type=3 }
else { print "error, got unknown type " parts[1]; exit 1 }
date=(type==1 ? substr(parts[3], 1, 8) : parts[4])
footprint=(type==1 ? parts[6] : parts[3])
print status, type, date, footprint, $0
}
' file | sort -k1,1r -k2,2n -k3,3 -k4,4 | cut -f5-
这个想法是从每个记录中提取工作状态、卫星类型、记录日期和足迹并将它们保存在四个变量中,类型被数字替换以定义自定义顺序。
然后打印这四个变量(以制表符分隔并以原始记录为后缀),根据需要对输出进行排序,然后用 删除前四个字段cut
。
输出:
L1_Data/level1/T32TQR/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQR_20200101T110654.SAFE QUEUED
L1_Data/level1/T32TQR/S2B_MSIL1C_20200104T101319_N0208_R022_T32TQR_20200104T121239.SAFE QUEUED
L1_Data/level1/192027/LC08_L1TP_192027_20201212_20210313_01_T1 QUEUED
L1_Data/level1/T32TQS/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQS_20200101T110654.SAFE DONE
L1_Data/level1/T33TUL/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUL_20200101T110654.SAFE DONE
L1_Data/level1/T33TUM/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUM_20200101T110654.SAFE DONE
L1_Data/level1/T32TQS/S2A_MSIL1C_20200102T102421_N0208_R065_T32TQS_20200102T105534.SAFE DONE
L1_Data/level1/T33TUL/S2B_MSIL1C_20200104T101319_N0208_R022_T33TUL_20200104T121239.SAFE DONE
L1_Data/level1/T32TQS/S2A_MSIL1C_20200106T100401_N0208_R122_T32TQS_20200106T103423.SAFE DONE
L1_Data/level1/192027/LC08_L1TP_192027_20201126_20210316_01_T1 DONE
L1_Data/level1/192028/LC08_L1TP_192028_20201126_20210316_01_T1 DONE
L1_Data/level1/192029/LC08_L1TP_192029_20201126_20210316_01_T1 DONE
L1_Data/level1/191027/LC08_L1TP_191027_20201221_20210310_01_T1 DONE
L1_Data/level1/191027/LE07_L1TP_191027_20201127_20201223_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201127_20201223_01_T1 DONE
L1_Data/level1/191028/LE07_L1TP_191028_20201213_20210108_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201213_20210108_01_T1 DONE