我有一个包含 2 列的文本文件。第一个包含日期(DD/MM/YYYY),第二个包含数字。它看起来像这样:
15/01/1945 105.0
16/01/1945 4.2
17/01/1945 3.0
31/01/1945 12.0
01/02/1945 3.0
02/02/1945 125.0
05/02/1945 0.3
我需要用以下条件填充文件:
- 第一次约会 1945 年 1 月 1 日
- 最后日期 2021 年 12 月 31 日
- 日期必须连续,行之间相差一天
- 如果缺少日期,我们必须使用正确的日期和数字 -99.0 来完成该行
因此,最终文件应如下所示:
01/01/1945 -99.0
02/01/1945 -99.0
03/01/1945 -99.0
04/01/1945 -99.0
05/01/1945 -99.0
06/01/1945 -99.0
07/01/1945 -99.0
08/01/1945 -99.0
09/01/1945 -99.0
10/01/1945 -99.0
11/01/1945 -99.0
12/01/1945 -99.0
13/01/1945 -99.0
14/01/1945 -99.0
15/01/1945 105.0
16/01/1945 4.2
17/01/1945 3.0
18/01/1945 -99.0
19/01/1945 -99.0
20/01/1945 -99.0
21/01/1945 -99.0
22/01/1945 -99.0
23/01/1945 -99.0
24/01/1945 -99.0
25/01/1945 -99.0
26/01/1945 -99.0
27/01/1945 -99.0
28/01/1945 -99.0
29/01/1945 -99.0
30/01/1945 -99.0
31/01/1945 12.0
01/02/1945 3.0
02/02/1945 125.0
03/02/1945 -99.0
04/02/1945 -99.0
05/02/1945 0.3
06/02/1945 -99.0
07/02/1945 -99.0
...
30/12/2021 -99.0
31/12/2021 -99.0
我尝试过使用 Fortran 程序,但它不起作用。我认为可能使用 awk 或 sed 或两者都使用。
这是我读 Ed 的剧本时得到的结果:
meteo@poniente:/datos$ cat awk.script
#!/bin/bash
cat tst.awk
awk { dates2vals[$1] = $2 }
END {
begDate = "01/01/1945"
endDate = "31/12/2000"
begSecs = mktime(gensub("(.*)/(.*)/(.*)","\\3 \\2 \\1 12 00 00",1,begDate))
daySecs = 24 * 60 * 60
for (curSecs=begSecs; curDate!=endDate; curSecs+=daySecs) {
curDate = strftime("%d/%m/%Y",curSecs)
print curDate, (curDate in dates2vals ? dates2vals[curDate] : "-99.0")
}
}
这就是我运行 Ed 的脚本时得到的结果:
meteo@poniente:/datos$ ./tst.awk
01/01/1946 3.0
02/01/1946 14.2
...
14/11/2021 0.0
15/11/2021 0.0
16/11/2021 0.0
17/11/2021 0.0
18/11/2021 0.0
19/11/2021 0.0
20/11/2021 0.0
21/11/2021 0.0
22/11/2021 54.1
23/11/2021 -99.0
24/11/2021 27.4
25/11/2021 0.0
29/11/2021 0.0
30/11/2021 0.0
awk: line ord.:1: {
awk: line ord.:1: ^ unexpected newline or end of string
./awk.script: line 4: END: command not found
./awk.script: line 5: begDate: command not found
./awk.script: line 6: endDate: command not found
./awk.script: line 7: syntax error near unexpected element `('
./awk.script: line 7: ` begSecs = mktime(gensub("(.*)/(.*)/(.*)","\\3 \\2 \\1 12 00 00",1,begDate))'
meteo@poniente:/datos$
答案1
尝试创建长列表,使用seq
(以纪元秒为单位:start,delta=1day,end)和date
的-f
选项,使用默认值-99.0
,然后在可能的情况下替换为awk
:
seq -f"@%.0f" -- -788878800 86400 1640905200 | date -uf- +"%d/%m/%Y -99.0" | awk 'FNR==NR {A[$1] = $2; next} $1 in A {$2 = A[$1]} 1' file -
01/01/1945 -99.0
02/01/1945 -99.0
.
.
.
14/01/1945 -99.0
15/01/1945 105.0
16/01/1945 4.2
17/01/1945 3.0
18/01/1945 -99.0
19/01/1945 -99.0
20/01/1945 -99.0
21/01/1945 -99.0
22/01/1945 -99.0
23/01/1945 -99.0
24/01/1945 -99.0
25/01/1945 -99.0
26/01/1945 -99.0
27/01/1945 -99.0
28/01/1945 -99.0
29/01/1945 -99.0
30/01/1945 -99.0
31/01/1945 12.0
01/02/1945 3.0
02/02/1945 125.0
03/02/1945 -99.0
04/02/1945 -99.0
05/02/1945 0.3
06/02/1945 -99.0
07/02/1945 -99.0
08/02/1945 -99.0
09/02/1945 -99.0
10/02/1945 -99.0
.
.
.
28/12/2021 -99.0
29/12/2021 -99.0
30/12/2021 -99.0
答案2
使用 GNU awk 实现时间函数:
$ cat tst.awk
{ dates2vals[$1] = $2 }
END {
begDate = "01/01/1945"
endDate = "31/12/2021"
begSecs = mktime(gensub("(.*)/(.*)/(.*)","\\3 \\2 \\1 12 00 00",1,begDate))
daySecs = 24 * 60 * 60
for (curSecs=begSecs; curDate!=endDate; curSecs+=daySecs) {
curDate = strftime("%d/%m/%Y",curSecs)
print curDate, (curDate in dates2vals ? dates2vals[curDate] : "-99.0")
}
}
$ awk -f tst.awk file | wc -l
28124
$ awk -f tst.awk file | head -5
01/01/1945 -99.0
02/01/1945 -99.0
03/01/1945 -99.0
04/01/1945 -99.0
05/01/1945 -99.0
$ awk -f tst.awk file | tail -5
27/12/2021 -99.0
28/12/2021 -99.0
29/12/2021 -99.0
30/12/2021 -99.0
31/12/2021 -99.0
$ awk -f tst.awk file | grep -v '99.0'
15/01/1945 105.0
16/01/1945 4.2
17/01/1945 3.0
31/01/1945 12.0
01/02/1945 3.0
02/02/1945 125.0
05/02/1945 0.3