剪切第二列中的前两个字符

剪切第二列中的前两个字符

我有一个包含美国和加拿大州/省列表的文件,如下所示:

    id,name,abbreviation,country,type,sort,status,occupied,notes,fips_state,assoc_press,standard_federal_region,census_region,census_region_name,census_division,census_division_name,circuit_court
"1","Alabama","AL","USA","state","10","current","occupied","","1","Ala.","IV","3","South","6","East South Central","11"
"2","Alaska","AK","USA","state","10","current","occupied","","2","Alaska","X","4","West","9","Pacific","9"
"3","Arizona","AZ","USA","state","10","current","occupied","","4","Ariz.","IX","4","West","8","Mountain","9"
"4","Arkansas","AR","USA","state","10","current","occupied","","5","Ark.","VI","3","South","7","West South Central","8"
"5","California","CA","USA","state","10","current","occupied","","6","Calif.","IX","4","West","9","Pacific","9"
"6","Colorado","CO","USA","state","10","current","occupied","","8","Colo.","VIII","4","West","8","Mountain","10"
"7","Connecticut","CT","USA","state","10","current","occupied","","9","Conn.","I","1","Northeast","1","New England","2"
"8","Delaware","DE","USA","state","10","current","occupied","","10","Del.","III","3","South","5","South Atlantic","3"
"9","Florida","FL","USA","state","10","current","occupied","","12","Fla.","IV","3","South","5","South Atlantic","11"
"10","Georgia","GA","USA","state","10","current","occupied","","13","Ga.","IV","3","South","5","South Atlantic","11"
"11","Hawaii","HI","USA","state","10","current","occupied","","15","Hawaii","IX","4","West","9","Pacific","9"
"12","Idaho","ID","USA","state","10","current","occupied","","16","Idaho","X","4","West","8","Mountain","9"
"13","Illinois","IL","USA","state","10","current","occupied","","17","Ill.","V","2","Midwest","3","East North Central","7"
"14","Indiana","IN","USA","state","10","current","occupied","","18","Ind.","V","2","Midwest","3","East North Central","7"
"15","Iowa","IA","USA","state","10","current","occupied","","19","Iowa","VII","2","Midwest","4","West North Central","8"
"16","Kansas","KS","USA","state","10","current","occupied","","20","Kan.","VII","2","Midwest","4","West North Central","10"
"17","Kentucky","KY","USA","state","10","current","occupied","","21","Ky.","IV","3","South","6","East South Central","6"
"18","Louisiana","LA","USA","state","10","current","occupied","","22","La.","VI","3","South","7","West South Central","5"
"19","Maine","ME","USA","state","10","current","occupied","","23","Maine","I","1","Northeast","1","New England","1"
"20","Maryland","MD","USA","state","10","current","occupied","","24","Md.","III","3","South","5","South Atlantic","4"
"21","Massachusetts","MA","USA","state","10","current","occupied","","25","Mass.","I","1","Northeast","1","New England","1"
"22","Michigan","MI","USA","state","10","current","occupied","","26","Mich.","V","2","Midwest","3","East North Central","6"
"23","Minnesota","MN","USA","state","10","current","occupied","","27","Minn.","V","2","Midwest","4","West North Central","8"
"24","Mississippi","MS","USA","state","10","current","occupied","","28","Miss.","IV","3","South","6","East South Central","5"
"25","Missouri","MO","USA","state","10","current","occupied","","29","Mo.","VII","2","Midwest","4","West North Central","8"
"26","Montana","MT","USA","state","10","current","occupied","","30","Mont.","VIII","4","West","8","Mountain","9"
"27","Nebraska","NE","USA","state","10","current","occupied","","31","Neb.","VII","2","Midwest","4","West North Central","8"
"28","Nevada","NV","USA","state","10","current","occupied","","32","Nev.","IX","4","West","8","Mountain","9"
"29","New Hampshire","NH","USA","state","10","current","occupied","","33","N.H.","I","1","Northeast","1","New England","1"
"30","New Jersey","NJ","USA","state","10","current","occupied","","34","N.J.","II","1","Northeast","2","Mid-Atlantic","3"
"31","New Mexico","NM","USA","state","10","current","occupied","","35","N.M.","VI","4","West","8","Mountain","10"
"32","New York","NY","USA","state","10","current","occupied","","36","N.Y.","II","1","Northeast","2","Mid-Atlantic","2"
"33","North Carolina","NC","USA","state","10","current","occupied","","37","N.C.","IV","3","South","5","South Atlantic","4"
"34","North Dakota","ND","USA","state","10","current","occupied","","38","N.D.","VIII","2","Midwest","4","West North Central","8"
"35","Ohio","OH","USA","state","10","current","occupied","","39","Ohio","V","2","Midwest","3","East North Central","6"
"36","Oklahoma","OK","USA","state","10","current","occupied","","40","Okla.","VI","3","South","7","West South Central","10"
"37","Oregon","OR","USA","state","10","current","occupied","","41","Ore.","X","4","West","9","Pacific","9"
"38","Pennsylvania","PA","USA","state","10","current","occupied","","42","Pa.","III","1","Northeast","2","Mid-Atlantic","3"
"39","Rhode Island","RI","USA","state","10","current","occupied","","44","R.I.","I","1","Northeast","1","New England","1"
"40","South Carolina","SC","USA","state","10","current","occupied","","45","S.C.","IV","3","South","5","South Atlantic","4"
"41","South Dakota","SD","USA","state","10","current","occupied","","46","S.D.","VIII","2","Midwest","4","West North Central","8"
"42","Tennessee","TN","USA","state","10","current","occupied","","47","Tenn.","IV","3","South","6","East South Central","6"
"43","Texas","TX","USA","state","10","current","occupied","","48","Texas","VI","3","South","7","West South Central","5"
"44","Utah","UT","USA","state","10","current","occupied","","49","Utah","VIII","4","West","8","Mountain","10"
"45","Vermont","VT","USA","state","10","current","occupied","","50","Vt.","I","1","Northeast","1","New England","2"
"46","Virginia","VA","USA","state","10","current","occupied","","51","Va.","III","3","South","5","South Atlantic","4"
"47","Washington","WA","USA","state","10","current","occupied","","53","Wash.","X","4","West","9","Pacific","9"
"48","West Virginia","WV","USA","state","10","current","occupied","","54","W.Va.","III","3","South","5","South Atlantic","4"
"49","Wisconsin","WI","USA","state","10","current","occupied","","55","Wis.","V","2","Midwest","3","East North Central","7"
"50","Wyoming","WY","USA","state","10","current","occupied","","56","Wyo.","VIII","4","West","8","Mountain","10"
"51","Washington DC","DC","USA","capitol","10","current","occupied","","11","","III","3","South","5","South Atlantic","D.C."
"60","Alberta","AB","Canada","province","30","current","occupied","","","","","","","","",""
"61","British Columbia","BC","Canada","province","30","current","occupied","","","","","","","","",""
"62","Manitoba","MB","Canada","province","30","current","occupied","","","","","","","","",""
"63","New Brunswick","NB","Canada","province","30","current","occupied","","","","","","","","",""
"64","Newfoundland and Labrador","NL","Canada","province","30","current","occupied","","","","","","","","",""
"65","Nova Scotia","NS","Canada","province","30","current","occupied","","","","","","","","",""
"66","Ontario","ON","Canada","province","30","current","occupied","","","","","","","","",""
"67","Prince Edward Island","PE","Canada","province","30","current","occupied","","","","","","","","",""
"68","Quebec","QC","Canada","province","30","current","occupied","","","","","","","","",""
"69","Saskatchewan","SK","Canada","province","30","current","occupied","","","","","","","","",""

我想做这个:

name,country
Alabama,US
...
Wyoming,US
Alberta,Ca
Saskatchewan,Ca

首先是美国各州,然后是加州各省。

我的解决方案是这样的:

#!/bin/sh

cat north_america.csv | head -n1 | cut -d',' -f2,4 > title
cat north* | tail -n +2 | cut -d',' -f2,4 | tr -d '"' | sort -t','  -k 2  | head -n10 > Canada
cat north* | tail -n +2 | cut -d',' -f2,4 | tr -d '"' | sort -t','  -k 2  | tail -n +11  > USA

cat USA | rev | cut -c-1 --complement | rev > file1
cat Canada | rev | cut -c 1-4 --complement | rev > file2

cat title > states
cat file1 >> states
cat file2 >> states

我的问题是,我是否可以以某种方式从第二列中“剪切”前两个字符?我将使用而不是“头”和“尾”

cat north* | tail -n +2 | cut -d',' -f2,4 | tr -d '"' | sort -t','  -k2,2r >> states

然后我会发出“剪切”命令。但我不知道怎么做。我不想使用 head 和 tail 并将文件拆分为两个文件。我想采取更简单的方法。

我将不胜感激任何建议。

答案1

为此,您所需要的只是:

awk -F, -vOFS="," '{print $2,$4}' file 

-F,字段分隔符设置为,并将-vOFS=","输出字段分隔符设置为,。然后,我们只打印每行的第二个和第四个字段。在您的示例文件中,这将返回:

$ awk -F, -vOFS="," '{print $2,$4}' file 
name,country
"Alabama","USA"
"Alaska","USA"
"Arizona","USA"
"Arkansas","USA"
"California","USA"
"Colorado","USA"
"Connecticut","USA"
"Delaware","USA"
"Florida","USA"
"Georgia","USA"
"Hawaii","USA"
"Idaho","USA"
"Illinois","USA"
"Indiana","USA"
"Iowa","USA"
"Kansas","USA"
"Kentucky","USA"
"Louisiana","USA"
"Maine","USA"
"Maryland","USA"
"Massachusetts","USA"
"Michigan","USA"
"Minnesota","USA"
"Mississippi","USA"
"Missouri","USA"
"Montana","USA"
"Nebraska","USA"
"Nevada","USA"
"New Hampshire","USA"
"New Jersey","USA"
"New Mexico","USA"
"New York","USA"
"North Carolina","USA"
"North Dakota","USA"
"Ohio","USA"
"Oklahoma","USA"
"Oregon","USA"
"Pennsylvania","USA"
"Rhode Island","USA"
"South Carolina","USA"
"South Dakota","USA"
"Tennessee","USA"
"Texas","USA"
"Utah","USA"
"Vermont","USA"
"Virginia","USA"
"Washington","USA"
"West Virginia","USA"
"Wisconsin","USA"
"Wyoming","USA"
"Washington DC","USA"
"Alberta","Canada"
"British Columbia","Canada"
"Manitoba","Canada"
"New Brunswick","Canada"
"Newfoundland and Labrador","Canada"
"Nova Scotia","Canada"
"Ontario","Canada"
"Prince Edward Island","Canada"
"Quebec","Canada"
"Saskatchewan","Canada"

要删除引号,您可以通过以下方式传递tr

awk -F, -vOFS="," '{print $2,$4}' file | tr -d \"

获取输出确切地正如你所展示的(所以 no "US而不是和USA),你可以使用(假设 GNU ):CaCanadased

awk -F, -vOFS="," '{print $2,$4}' file | sed 's/"//g; s/USA/US/; s/Canada/Ca/'

或者,如果您没有 GNU sed

awk -F, -vOFS="," '{print $2,$4}' file | sed -e 's/"//g' -e 's/USA/US/' -e 's/Canada/Ca/'

相关内容