How do I use a shell script to keep only the last populated value for each month in a CSV?

How do I use a shell script to keep only the last populated value for each month in a CSV?

I have a csv file which gets updated by a cron daily and looks something like this:

Date,Value
01/11/2019,123
02/11/2019,456
03/11/2019,789
...
31/01/2020,123
01/02/2020,456
02/02/2020,789
03/02/2020,123
04/02/2020,456
05/02/2020,789

I would like the file to be updated by a shell script so that it always has the last entry for each month e.g.

Date,Value
30/11/19,123
31/12/19,456
31/01/20,789
05/02/20,789

Note that the last populated line for each month may not be on the last day of that month

Not sure how to approach this so would really appreciate some help!

答案1

Since your dates are already ordered, you should be able to print the previous record every time the month changes (and once more at the very end).

Ex. given

$ cat file.csv
Date,Value
01/11/2019,123
02/11/2019,456
03/11/2019,789
31/01/2020,123
01/02/2020,456
02/02/2020,789
03/02/2020,123
04/02/2020,456
05/02/2020,789 

then

$ awk -F, '{split($1,a,"/")} a[2] != lastm {print last; lastm = a[2]} {last = $0} END {print last}' file.csv
Date,Value
03/11/2019,789
31/01/2020,123
05/02/2020,789 

You get the header line for free since the first month value is always different from the empty string.

相关内容