如何根据每月的最新日期打印行

Question 1

和米勒（mlr)，将数据读取为名为 CSV 文件，file使用|(pipe) 作为字段分隔符，我们可以从字段中解析出年份和月份col3，并获取每个年份+月份组合的第一个值：

mlr --csv --fs pipe \
    put 'd = splita($col3, "-"); $y=d[1]; $m=d[2]' then \
    head -n 1 -g y,m then \
    cut -x -f y,m file

米勒put表达式，

d  = splita($col3, "-");
$y = d[1];
$m = d[2];

d...通过在破折号上分割日期戳，从日期戳创建一个临时数组。然后，我们从分割的部分创建两个新字段y（年）和（月）。m

然后head -n 1 -g y,m我们得到每个年+月组的第一个值。

cut最后的操作删除了我们不再需要的y和字段。m

如果数据未排序，则需要在开始时（或至少在之前head）应用额外的排序步骤：

mlr --csv --fs pipe \
    sort -r col3 then \
    put 'd = splita($col3, "-"); $y=d[1]; $m=d[2]' then \
    head -n 1 -g y,m then \
    cut -x -f y,m file

要获得无标头 CSV 输出，请mlr与其--headerless-csv-output(或--ho) 选项一起使用。

对给定数据运行的示例：

$ mlr --csv --fs pipe --headerless-csv-output sort -r col3 then put 'd = splita($col3, "-"); $y=d[1]; $m=d[2]' then head -n 1 -g y,m then cut -x -f y,m file
abc|xyz|2024-02-29
dfr|jfdfd|2024-01-31
fdg|rgrg|2023-12-31
gfgf|hhfdfd|2023-11-28

Answer

和米勒（mlr)，将数据读取为名为 CSV 文件，file使用|(pipe) 作为字段分隔符，我们可以从字段中解析出年份和月份col3，并获取每个年份+月份组合的第一个值：

mlr --csv --fs pipe \
    put 'd = splita($col3, "-"); $y=d[1]; $m=d[2]' then \
    head -n 1 -g y,m then \
    cut -x -f y,m file

米勒put表达式，

d  = splita($col3, "-");
$y = d[1];
$m = d[2];

d...通过在破折号上分割日期戳，从日期戳创建一个临时数组。然后，我们从分割的部分创建两个新字段y（年）和（月）。m

然后head -n 1 -g y,m我们得到每个年+月组的第一个值。

cut最后的操作删除了我们不再需要的y和字段。m

如果数据未排序，则需要在开始时（或至少在之前head）应用额外的排序步骤：

mlr --csv --fs pipe \
    sort -r col3 then \
    put 'd = splita($col3, "-"); $y=d[1]; $m=d[2]' then \
    head -n 1 -g y,m then \
    cut -x -f y,m file

要获得无标头 CSV 输出，请mlr与其--headerless-csv-output(或--ho) 选项一起使用。

对给定数据运行的示例：

$ mlr --csv --fs pipe --headerless-csv-output sort -r col3 then put 'd = splita($col3, "-"); $y=d[1]; $m=d[2]' then head -n 1 -g y,m then cut -x -f y,m file
abc|xyz|2024-02-29
dfr|jfdfd|2024-01-31
fdg|rgrg|2023-12-31
gfgf|hhfdfd|2023-11-28

Question 2

使用 sqlite3 完成所有工作的 shell 脚本：

#!/bin/sh

# Filename to process is only arg to script

sqlite3 <<EOF
.mode list
.headers off
.import '$1' data
WITH ranked_dates AS
 (SELECT col1, col2, col3, rank() OVER (PARTITION BY strftime('%Y-%m', col3) ORDER BY col3 DESC) AS date_rank
  FROM data)
SELECT col1, col2, col3
FROM ranked_dates
WHERE date_rank = 1
ORDER BY col3 DESC
EOF

例子：

$ ./maxdates dates.txt
abc|xyz|2024-02-29
dfr|jfdfd|2024-01-31
fdg|rgrg|2023-12-31
gfgf|hhfdfd|2023-11-28

这个适用于任何顺序的输入数据，而不是假设它已经排序。

Answer

使用 sqlite3 完成所有工作的 shell 脚本：

#!/bin/sh

# Filename to process is only arg to script

sqlite3 <<EOF
.mode list
.headers off
.import '$1' data
WITH ranked_dates AS
 (SELECT col1, col2, col3, rank() OVER (PARTITION BY strftime('%Y-%m', col3) ORDER BY col3 DESC) AS date_rank
  FROM data)
SELECT col1, col2, col3
FROM ranked_dates
WHERE date_rank = 1
ORDER BY col3 DESC
EOF

例子：

$ ./maxdates dates.txt
abc|xyz|2024-02-29
dfr|jfdfd|2024-01-31
fdg|rgrg|2023-12-31
gfgf|hhfdfd|2023-11-28

这个适用于任何顺序的输入数据，而不是假设它已经排序。

Question 3

使用任何 awk，选择您的选择：

$ awk -F'|' 'NR>1 && !seen[substr($NF,1,7)]++' file
abc|xyz|2024-02-29
dfr|jfdfd|2024-01-31
fdg|rgrg|2023-12-31
gfgf|hhfdfd|2023-11-28

$ awk 'NR>1 && !seen[substr($0,length()-9,7)]++' file
abc|xyz|2024-02-29
dfr|jfdfd|2024-01-31
fdg|rgrg|2023-12-31
gfgf|hhfdfd|2023-11-28

$ awk -F'[-|]' 'NR>1 && !seen[$(NF-2)$(NF-1)]++' file
abc|xyz|2024-02-29
dfr|jfdfd|2024-01-31
fdg|rgrg|2023-12-31
gfgf|hhfdfd|2023-11-28

Answer

使用任何 awk，选择您的选择：

$ awk -F'|' 'NR>1 && !seen[substr($NF,1,7)]++' file
abc|xyz|2024-02-29
dfr|jfdfd|2024-01-31
fdg|rgrg|2023-12-31
gfgf|hhfdfd|2023-11-28

$ awk 'NR>1 && !seen[substr($0,length()-9,7)]++' file
abc|xyz|2024-02-29
dfr|jfdfd|2024-01-31
fdg|rgrg|2023-12-31
gfgf|hhfdfd|2023-11-28

$ awk -F'[-|]' 'NR>1 && !seen[$(NF-2)$(NF-1)]++' file
abc|xyz|2024-02-29
dfr|jfdfd|2024-01-31
fdg|rgrg|2023-12-31
gfgf|hhfdfd|2023-11-28

Question 4

使用乐（以前称为 Perl_6）

~$ raku -e 'lines[0].put;  my @a = lines.map: *.split: "|";  my $b = @a.map: *.[2];   
            my  %c = $b.classify( { $_.Date.month }, :as{ $_ => $++ } );  
            for %c.map(*.value.max.values).flat.sort -> $n {    
                @a[$n].join("|").put; 
            };'  file

或者：

~$ raku -e 'lines[0].put;  my @a = lines.map: *.split: "|";  my $b = @a.map: *.[2];  
            my %c = $b.classify( { $_.Date.month }, :as{ $_ => $++ } ); 
            put @a[$_].join("|") for %c.map(*.value.max.values).flat.sort;'  file

Raku 是 Perl 家族的一种编程语言。 Raku 的一项功能是内置支持ISO 8601 Dates。这意味着一旦您调用Date一个值，就会检查它的有效性。因此，对于给定的数据集，2024-02-29工作得很好，但2024-02-30会引发Day out of range错误。

即使行不按日期顺序，上面的代码也应该可以正常工作：

读完header( lines[0])后立即输出put。
读取栏上剩余的lines内容，保存到数组。split|@a
将第三（日期）列复制到$b标量中。
使用内置Date和month例程，按月计算标量，使用classify返回日期和行号$b:as$++作为一个 {key =>value} 对，保存到%c哈希。
按元素，map进入%c哈希，选择按月*.value.max，这允许我们获取行号（使用，.values因为这就是我们设置键/值的方式）。返回索引(0 3 5 7)。
迭代索引，找出put原始数组的正确位置@a，并将列join重新设置为|。

输入示例：

col1|col2|col3
abc|xyz|2024-02-29
hds|fsfs|2024-02-28
fdg|sffe|2024-02-27
dfr|jfdfd|2024-01-31
fdf|gfgfg|2024-01-30
fdg|rgrg|2023-12-31
fgf|yjyjy|2023-12-30
gfgf|hhfdfd|2023-11-28
gfgfg|uysdfd|2023-11-27

示例输出：

col1|col2|col3
abc|xyz|2024-02-29
dfr|jfdfd|2024-01-31
fdg|rgrg|2023-12-31
gfgf|hhfdfd|2023-11-28

注意：有时查看语言的数据内部表示会很有帮助，因此%c步骤 4 后的哈希值如下所示（按月份顺序排序）：

~$ raku -e 'lines[0].skip; my @a = lines.map: *.split: "|"; my $b = @a.map: *.[2];  my %c = $b.classify( { $_.Date.month }, :as{ $_ => $++ } ); say %c.sort(*.key.Int);'  file
(1 => [2024-01-31 => 3 2024-01-30 => 4] 2 => [2024-02-29 => 0 2024-02-28 => 1 2024-02-27 => 2] 11 => [2023-11-28 => 7 2023-11-27 => 8] 12 => [2023-12-31 => 5 2023-12-30 => 6])

https://docs.raku.org/type/Date https://docs.raku.org/language/hashmap#Mutable_hashes_and_immutable_maps https://raku.org

Answer