从另一个列值开始读取 CSV 文件中的列值?

从另一个列值开始读取 CSV 文件中的列值?

我正在努力从经合组织的统计数据中提取某些数据。这些内容以 CSV 文件形式提供,如下所示(摘录):

"COUNTRY","Country","DAGEGR","Age groups","DSEX","Gender","DSTATUS","Status of population","YEAR","Year","Value","Flag Codes","Flags"
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2002","2002",19640979,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2003","2003",19872646,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2004","2004",20091504,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2005","2005",20339759,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2006","2006",20605488,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2007","2007",21015042,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2008","2008",21431781,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2009","2009",21874920,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2010","2010",22342398,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2011","2011",22620554,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2012","2012",22683573,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2002","2002",444050,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2003","2003",448300,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2004","2004",451600,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2005","2005",455000,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2006","2006",469086,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2007","2007",476187,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2008","2008",483799,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2009","2009",493500,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2010","2010",502066,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2011","2011",511840,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2012","2012",524853,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2002","2002",40409330,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2003","2003",41550584,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2004","2004",42345342,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2005","2005",43038035,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2006","2006",43758250,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2007","2007",44474631,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2008","2008",45283259,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2009","2009",45828172,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2010","2010",45989016,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2011","2011",46152926,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2012","2012",46818221,,

我想做的是自动读出 2007 年西班牙 (ESP) 有多少居民。为了实现这一点,我使用以下命令:

cat RPOP_16012023145346836.csv | cut -d "," -f 1,9,11 | sort

该命令返回国家、年份和居民数量,例如:

"AUS","2002",19640979
"AUS","2003",19872646
"AUS","2004",20091504
"AUS","2005",20339759
"AUS","2006",20605488
"AUS","2007",21015042
"AUS","2008",21431781
"AUS","2009",21874920
"AUS","2010",22342398
"AUS","2011",22620554
"AUS","2012",22683573
"AUT","2002",8139310
"AUT","2003",8067289
"AUT","2004",8140122
"AUT","2005",8206524
"AUT","2006",8265925
"AUT","2007",8298923
"AUT","2008",8331930
"AUT","2009",8355260
"AUT","2010",8375290
"AUT","2011",8404252
"AUT","2012",8443018
"ESP","2002",40409330
"ESP","2003",41550584
"ESP","2004",42345342
"ESP","2005",43038035
"ESP","2006",43758250
"ESP","2007",44474631
"ESP","2008",45283259
"ESP","2009",45828172
"ESP","2010",45989016
"ESP","2011",46152926
"ESP","2012",46818221
"LUX","2002",444050
"LUX","2003",448300
"LUX","2004",451600
"LUX","2005",455000
"LUX","2006",469086
"LUX","2007",476187
"LUX","2008",483799
"LUX","2009",493500
"LUX","2010",502066
"LUX","2011",511840
"LUX","2012",524853

我想使用管道将此结果传递到一个语句,该语句根据国家/地区(第 1 列 = ESP)和年份(第 2 列 = 2007 年)返回第三列(第 3 列 = 人口)。不幸的是,我没有适当的命令来执行此操作。有人可以帮我弄这个吗?

预期输出为:44474631

答案1

使用 Miller ( ) 首先使用两个命名字段和mlr过滤数据,仅查找与 2007 年西班牙相关的记录,然后从该记录中删除该字段。输出将不带标题显示。COUNTRYYEARValue

mlr --csv --headerless-csv-output \
    filter '$COUNTRY == "ESP" && $YEAR == 2007' then \
    cut -f Value \
    RPOP_16012023145346836.csv

给定问题中的数据,这将输出

44474631

而不是过滤表达式

$COUNTRY == "ESP" && $YEAR == 2007

...你可以使用

$Country == "Spain" && $Year == 2007

...因为那些命名字段也存在。

答案2

如果目的确实是简单地返回 number ,您可以在原始输入文件上44474631使用所有操作:awk

awk -F, '$1=="\"ESP\"" && $9=="\"2007\"" {print $11}' RPOP_16012023145346836.csv

这会将字段分隔符设置为,并打印第 11 列的内容,但前提是第 1 列等于"ESP"(包括引号)并且第 9 列等于"2007"(同样包括引号)。

答案3

你可以使用 awk 来实现:

$ sed 's/"//g' input_file | awk -F, '($1=="ESP" && $9==2007){print $1,$9,$11}'
ESP 2007 44474631

相关内容