从日期列中分割年份并使用 awk 或 sed 创建包含年份的新列

从日期列中分割年份并使用 awk 或 sed 创建包含年份的新列

我有一个 csv 格式的数据集,如下所示(6000 行):

Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
1,05-02-2010,1643690.9,0,42.31,2.572,211.0963582,8.106
1,12-02-2010,1641957.44,1,38.51,2.548,211.2421698,8.106
---
---

我想拆分日期列以仅包含年份创建一个新列。像这样的东西:

Store,Date,Year,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
1,05-02-2010,2010,1643690.9,0,42.31,2.572,211.0963582,8.106
1,12-02-2010,2010,1641957.44,1,38.51,2.548,211.2421698,8.106

我试图使用这段代码,但我不知道如何修改它来解决我的问题: 如何提取年份...

有人可以给我一些建议吗?

答案1

我会使用适当的csv解析器,例如csvkit

csvsql --query "select Store,Date,strftime('%Y', Date) as Year,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment  from data" data.csv

输出:

Store,Date,Year,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
1,2010-05-02,2010,1643690.9,0,42.31,2.572,211.0963582,8.106
1,2010-12-02,2010,1641957.44,1,38.51,2.548,211.2421698,8.106

但如果你想用 来做到这一点awk,可以这样做:

awk '
  BEGIN{FS=OFS=","}
  NR==1{$2="Date,Year"}
  NR>1 {sub(/[0-9]{4}/, "&,&", $2)}
1' data.csv

答案2

 awk 'BEGIN{FS=OFS=","}NR==1{gsub("Date","Date,year",$0)} NR>1 {$2=$2","substr($2,7,4)}1' filename

Store,Date,year,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
1,05-02-2010,2010,1643690.9,0,42.31,2.572,211.0963582,8.106
1,12-02-2010,2010,1641957.44,1,38.51,2.548,211.2421698,8.106

相关内容