如何使用 awk 获得天数差异？

2024-5-27 • tag-icon

我需要打印每个唯一 ID (5 美元) 的记录开始日期和结束日期之间的差异 (以天为单位) (6 美元)，该 ID 在新字段上有两条以上记录。

数据看起来像这样

7  65  2    5   32070  2010-12-14    13:25:30    
7  82  2    10  41920  2010-12-14    11:30:45   
7  65  2    5   32070  2010-03-25    10:15:45  
7  83  1    67  29446  2010-12-14    04:15:25          
7  81  1    47  32070  2011-5-11     08:14:20  
7  83  1    67  29446  2011-03-10    06:10:23  
7  82  2    10  41920  2011-02-28    06:25:30    
7  83  1    67  29446  2011-6-22     07:13:24  
7  82  2    10  41920  2011-5-14     06:15:25

我需要输出如下所示：

7  65  2    5   32070  2010-12-14    13:25:30   147    
7  82  2    10  41920  2010-12-14    11:30:45   150  
7  65  2    5   32070  2010-03-25    10:15:45   147  
7  83  1    67  29446  2010-12-14    04:15:25   189       
7  81  1    47  32070  2011-5-11     08:14:20   147  
7  83  1    67  29446  2011-03-10    06:10:23   189  
7  82  2    10  41920  2011-02-28    06:25:30   150   
7  83  1    67  29446  2011-6-22     07:13:24   189  
7  82  2    10  41920  2011-5-14     06:15:25   150

我编写了以下代码，但它没有考虑每个唯一 ID 的两条以上记录（$5）。

$ awk 'NR==FNR {  
           c = "date -d \""$6 "\" +%s"; # use system date for epoch time seconds  
           c | getline d;                 # execute command in c var,output to d   
           a[$5] = (($5 in a) ? d-a[$5] : d); # set or subtract from array  
           next                           # skip to next record  
       } {                                # for the second go:  
           # $1=$1;                       # uncomment to clean trailing space  
           print $0, int(a[$5]/86400)     # print record and time  difference  
       }' file file

答案1

该解决方案需要GNU awk：

NR == FNR {
    split($6, arr, "-");
    date = mktime(sprintf("%4d %02d %02d 00 00 00", arr[1], arr[2], arr[3]));
    if (!start[$5] || date < start[$5]) {
        start[$5] = date;
    }
    if (date > stop[$5]) {
        stop[$5] = date;
    }
    next;
}

{
    print $0 " " int((stop[$5] - start[$5]) / (3600 * 24));
}

答案1

相关内容