完成文件逐行读取日期

完成文件逐行读取日期

我有一个包含 2 列的文本文件。第一个包含日期(DD/MM/YYYY),第二个包含数字。它看起来像这样:

15/01/1945 105.0
16/01/1945   4.2
17/01/1945   3.0
31/01/1945  12.0
01/02/1945   3.0
02/02/1945 125.0
05/02/1945   0.3

我需要用以下条件填充文件:

  1. 第一次约会 1945 年 1 月 1 日
  2. 最后日期 2021 年 12 月 31 日
  3. 日期必须连续,行之间相差一天
  4. 如果缺少日期,我们必须使用正确的日期和数字 -99.0 来完成该行

因此,最终文件应如下所示:

01/01/1945 -99.0
02/01/1945 -99.0
03/01/1945 -99.0
04/01/1945 -99.0
05/01/1945 -99.0
06/01/1945 -99.0
07/01/1945 -99.0
08/01/1945 -99.0
09/01/1945 -99.0
10/01/1945 -99.0
11/01/1945 -99.0
12/01/1945 -99.0
13/01/1945 -99.0
14/01/1945 -99.0
15/01/1945 105.0
16/01/1945   4.2
17/01/1945   3.0
18/01/1945 -99.0
19/01/1945 -99.0
20/01/1945 -99.0
21/01/1945 -99.0
22/01/1945 -99.0
23/01/1945 -99.0
24/01/1945 -99.0
25/01/1945 -99.0
26/01/1945 -99.0
27/01/1945 -99.0
28/01/1945 -99.0
29/01/1945 -99.0
30/01/1945 -99.0
31/01/1945  12.0
01/02/1945   3.0
02/02/1945 125.0
03/02/1945 -99.0
04/02/1945 -99.0
05/02/1945   0.3
06/02/1945 -99.0
07/02/1945 -99.0
...
30/12/2021 -99.0
31/12/2021 -99.0

我尝试过使用 Fortran 程序,但它不起作用。我认为可能使用 awk 或 sed 或两者都使用。

这是我读 Ed 的剧本时得到的结果:

meteo@poniente:/datos$ cat awk.script
#!/bin/bash
cat tst.awk
awk { dates2vals[$1] = $2 }
END {
    begDate = "01/01/1945"
    endDate = "31/12/2000"
    begSecs = mktime(gensub("(.*)/(.*)/(.*)","\\3 \\2 \\1 12 00 00",1,begDate))
    daySecs = 24 * 60 * 60
    for (curSecs=begSecs; curDate!=endDate; curSecs+=daySecs) {
        curDate = strftime("%d/%m/%Y",curSecs)
        print curDate, (curDate in dates2vals ? dates2vals[curDate] : "-99.0")
    }
}

这就是我运行 Ed 的脚本时得到的结果:

meteo@poniente:/datos$ ./tst.awk
01/01/1946   3.0
02/01/1946  14.2
...
14/11/2021   0.0
15/11/2021   0.0
16/11/2021   0.0
17/11/2021   0.0
18/11/2021   0.0
19/11/2021   0.0
20/11/2021   0.0
21/11/2021   0.0
22/11/2021  54.1
23/11/2021 -99.0
24/11/2021  27.4
25/11/2021   0.0
29/11/2021   0.0
30/11/2021   0.0
awk: li­ne ord.:1: {
awk: line ord.:1:  ^ unexpected newline or end of string
./awk.script: li­ne 4: END: command not found
./awk.script: li­ne 5: begDate: command not found
./awk.script: li­ne 6: endDate: command not found
./awk.script: li­ne 7: syntax error near unexpected element `('
./awk.script: li­ne 7: `    begSecs = mktime(gensub("(.*)/(.*)/(.*)","\\3 \\2 \\1 12 00 00",1,begDate))'
meteo@poniente:/datos$

答案1

尝试创建长列表,使用seq(以纪元秒为单位:start,delta=1day,end)和date-f选项,使用默认值-99.0,然后在可能的情况下替换为awk

seq -f"@%.0f" -- -788878800 86400 1640905200 | date -uf- +"%d/%m/%Y -99.0" | awk 'FNR==NR {A[$1] = $2; next} $1 in A {$2 = A[$1]} 1' file - 
01/01/1945 -99.0
02/01/1945 -99.0
.
.
.

14/01/1945 -99.0
15/01/1945 105.0
16/01/1945 4.2
17/01/1945 3.0
18/01/1945 -99.0
19/01/1945 -99.0
20/01/1945 -99.0
21/01/1945 -99.0
22/01/1945 -99.0
23/01/1945 -99.0
24/01/1945 -99.0
25/01/1945 -99.0
26/01/1945 -99.0
27/01/1945 -99.0
28/01/1945 -99.0
29/01/1945 -99.0
30/01/1945 -99.0
31/01/1945 12.0
01/02/1945 3.0
02/02/1945 125.0
03/02/1945 -99.0
04/02/1945 -99.0
05/02/1945 0.3
06/02/1945 -99.0
07/02/1945 -99.0
08/02/1945 -99.0
09/02/1945 -99.0
10/02/1945 -99.0
.
.
.
28/12/2021 -99.0
29/12/2021 -99.0
30/12/2021 -99.0

答案2

使用 GNU awk 实现时间函数:

$ cat tst.awk
{ dates2vals[$1] = $2 }
END {
    begDate = "01/01/1945"
    endDate = "31/12/2021"
    begSecs = mktime(gensub("(.*)/(.*)/(.*)","\\3 \\2 \\1 12 00 00",1,begDate))
    daySecs = 24 * 60 * 60
    for (curSecs=begSecs; curDate!=endDate; curSecs+=daySecs) {
        curDate = strftime("%d/%m/%Y",curSecs)
        print curDate, (curDate in dates2vals ? dates2vals[curDate] : "-99.0")
    }
}

$ awk -f tst.awk file | wc -l
28124
$ awk -f tst.awk file | head -5
01/01/1945 -99.0
02/01/1945 -99.0
03/01/1945 -99.0
04/01/1945 -99.0
05/01/1945 -99.0
$ awk -f tst.awk file | tail -5
27/12/2021 -99.0
28/12/2021 -99.0
29/12/2021 -99.0
30/12/2021 -99.0
31/12/2021 -99.0
$ awk -f tst.awk file | grep -v '99.0'
15/01/1945 105.0
16/01/1945 4.2
17/01/1945 3.0
31/01/1945 12.0
01/02/1945 3.0
02/02/1945 125.0
05/02/1945 0.3

相关内容