我正在考虑自动化一些流程/计算,但我可能首先需要格式化一个有点尴尬的CSV文件集。 (为此bash
,我根据要求使用 )。
csv 文件集(大致)遵循以下格式
CODE,Sitting,Jan,Feb,Mar,Apr,May,Jun,Jul,Totals
CLLK_J9,First Sitting,,,2,5,2,,,10
,Second Sitting,,,,,,,,1
RTHM_A8,First Sitting,,,1,,3,,,6
,Second Sitting,,,,,1,,,1
FFBJ_FA9,First Sitting,,,,8,6,,,25
,Second Sitting,,,,,11,,,12
UUYIOR_HJ9,First Sitting,,,1,3,6,,,17
IKRO_Lk8,First Sitting,,,,3,3,,,37
,Second Sitting,,,,6,11,,,34
我试图用上一行中的CODE
字段内容填充该列中的空字段(通常这些空字段出现在第 2 列中的“第二次坐”实例旁边)。因此,对于上面的例子,结果应该是这样的
CODE,Sitting,Jan,Feb,Mar,Apr,May,Jun,Jul,Totals
CLLK_J9,First Sitting,,,2,5,2,,,10
CLLK_J9,Second Sitting,,,,,,,,1
etc.
我开始阅读一些awk
文档,因为它似乎是完成此任务的相当强大的实用程序 - 但尚未取得任何进展。想法?
塔
答案1
使用米勒(https://github.com/johnkerl/miller)非常简单。跑步
mlr --csv fill-down -f CODE input.csv >output.csv
你将会拥有
+------------+----------------+-----+-----+-----+-----+-----+-----+-----+--------+
| CODE | Sitting | Jan | Feb | Mar | Apr | May | Jun | Jul | Totals |
+------------+----------------+-----+-----+-----+-----+-----+-----+-----+--------+
| CLLK_J9 | First Sitting | - | - | 2 | 5 | 2 | - | - | 10 |
| CLLK_J9 | Second Sitting | - | - | - | - | - | - | - | 1 |
| RTHM_A8 | First Sitting | - | - | 1 | - | 3 | - | - | 6 |
| RTHM_A8 | Second Sitting | - | - | - | - | 1 | - | - | 1 |
| FFBJ_FA9 | First Sitting | - | - | - | 8 | 6 | - | - | 25 |
| FFBJ_FA9 | Second Sitting | - | - | - | - | 11 | - | - | 12 |
| UUYIOR_HJ9 | First Sitting | - | - | 1 | 3 | 6 | - | - | 17 |
| IKRO_Lk8 | First Sitting | - | - | - | 3 | 3 | - | - | 37 |
| IKRO_Lk8 | Second Sitting | - | - | - | 6 | 11 | - | - | 34 |
+------------+----------------+-----+-----+-----+-----+-----+-----+-----+--------+
答案2
如果您想尝试使用awk
,则以下内容应该有效:
awk -F',' -v OFS=',' 'FNR>1{if ($1!="") last=$1; else $1=last}1' input.csv
这会将输入和输出字段分隔符设置为,
。
然后,对于每一行第一行之后( FNR>1
),它将检查第一列 ( $1
) 是否为空。如果不是,该值将存储在变量中last
以供以后使用。如果为空,则用之前存储的值填充。
1
规则块的外部指示{ ... }
打印awk
当前行,包括所做的所有修改。请注意,除非true
指定此命令(或实际上评估为 的任何布尔条件),或者在规则块内给出显式print
/命令,否则不会打印当前行。printf
awk
结果:
CODE,Sitting,Jan,Feb,Mar,Apr,May,Jun,Jul,Totals
CLLK_J9,First Sitting,,,2,5,2,,,10
CLLK_J9,Second Sitting,,,,,,,,1
RTHM_A8,First Sitting,,,1,,3,,,6
RTHM_A8,Second Sitting,,,,,1,,,1
FFBJ_FA9,First Sitting,,,,8,6,,,25
FFBJ_FA9,Second Sitting,,,,,11,,,12
UUYIOR_HJ9,First Sitting,,,1,3,6,,,17
IKRO_Lk8,First Sitting,,,,3,3,,,37
IKRO_Lk8,Second Sitting,,,,6,11,,,34
答案3
不是 AWK 解决方案,但是...
$ cat yourfile.csv | csv-sqlite 'select code, (select i2.code from input i2 where i2.code != "" and i2.rowid <= i1.rowid order by i2.rowid desc limit 1) as new_CODE, Sitting, Jan, Feb, Mar, Apr, May, Jun, Jul, Totals from input i1' -s
CODE new_CODE Sitting Jan Feb Mar Apr May Jun Jul Totals
CLLK_J9 CLLK_J9 First Sitting 2 5 2 10
CLLK_J9 Second Sitting 1
RTHM_A8 RTHM_A8 First Sitting 1 3 6
RTHM_A8 Second Sitting 1 1
FFBJ_FA9 FFBJ_FA9 First Sitting 8 6 25
FFBJ_FA9 Second Sitting 11 12
UUYIOR_HJ9 UUYIOR_HJ9 First Sitting 1 3 6 17
IKRO_Lk8 IKRO_Lk8 First Sitting 3 3 37
IKRO_Lk8 Second Sitting 6 11 34
(如果你删除“-s”,它将以 CSV 形式返回)
csv-sqlite 来自https://github.com/mslusarz/csv-nix-tools
查询被盗自这个答案。
答案4
在每个 UNIX 机器上的任何 shell 中使用任何 awk:
$ awk 'BEGIN{FS=OFS=","} $1==""{$1=p} {p=$1} 1' file
CODE,Sitting,Jan,Feb,Mar,Apr,May,Jun,Jul,Totals
CLLK_J9,First Sitting,,,2,5,2,,,10
CLLK_J9,Second Sitting,,,,,,,,1
RTHM_A8,First Sitting,,,1,,3,,,6
RTHM_A8,Second Sitting,,,,,1,,,1
FFBJ_FA9,First Sitting,,,,8,6,,,25
FFBJ_FA9,Second Sitting,,,,,11,,,12
UUYIOR_HJ9,First Sitting,,,1,3,6,,,17
IKRO_Lk8,First Sitting,,,,3,3,,,37
IKRO_Lk8,Second Sitting,,,,6,11,,,34