我有一个包含 50 行的逗号分隔 csv 文件。一列用于州名称,另一列用于(州的)首都。如何创建一个循环来计算这两列中的标记数量 (2, 3, 4) 并将结果分组到一个数组中?这样做时是否可以跟踪有多少个这样的状态?
答案1
该解决方案使用 awk 代替。我从问题中了解到输出应该只包含州的名称。前面的答案提供了一个更有用的输出,并且 OP 接受了该答案,因此该脚本遵循相同的格式和相同的数据集。
{
x = $0
gsub(/,/, " ", $0)
a[x]=NF
}
END {
for (key in a) {
counter[a[key]] += 1
}
for (c in counter) {
print counter[c] " values with " c " tokens:"
for (key in a) {
if (c == a[key]) {
print "\t"key
}
}
}
}
32 values with 2 tokens:
Oregon,Salem
Virginia,Richmond
Montana,Helena
Florida,Tallahassee
Ohio,Columbus
Delaware,Dover
Nebraska,Lincoln
California,Sacramento
Wisconsin,Madison
Alaska,Juneau
Texas,Austin
Tennessee,Nashville
Hawaii,Honolulu
Maryland,Annapolis
Idaho,Boise
Illinois,Springfield
Wyoming,Cheyenne
Georgia,Atlanta
Connecticut,Hartford
Arizona,Phoenix
Indiana,Indianapolis
Colorado,Denver
Mississippi,Jackson
Washington,Olympia
Kentucky,Frankfort
Vermont,Montpelier
Maine,Augusta
Michigan,Lansing
Kansas,Topeka
Alabama,Montgomery
Massachusetts,Boston
Pennsylvania,Harrisburg
16 values with 3 tokens:
South Dakota,Pierre
New Hampshire,Concord
Arkansas,Little Rock
North Carolina,Raleigh
North Dakota,Bismarck
Louisiana,Baton Rouge
Oklahoma,Oklahoma City
New York,Albany
Nevada,Carson City
Iowa,Des Moines
South Carolina,Columbia
Rhode Island,Providence
New Jersey,Trenton
Minnesota,St. Paul
Missouri,Jefferson City
West Virginia,Charleston
2 values with 4 tokens:
Utah,Salt Lake City
New Mexico,Santa Fe
答案2
沿着State Capitals.csv
以下路线:
Alabama,Montgomery
Alaska,Juneau
Arizona,Phoenix
...
West Virginia,Charleston
Wisconsin,Madison
Wyoming,Cheyenne
以下 Bash 脚本(版本 4+)执行您所要求的操作(假设我理解您所要求的内容):
#!/bin/bash -e
export PATH=/bin:/sbin:/usr/bin:/usr/sbin
declare -A a
declare -i i j
while IFS=, read state capital; do
i=$(( $( echo "$state $capital" | tr -cd ' ' | wc -c ) + 1 ))
if [[ -z ${a[$i]} ]]; then
declare -a b=()
else
eval "${a[$i]}"
fi
b+=("$state|$capital")
a[$i]=$( declare -p b )
done <<< $( sort 'State Capitals.csv' )
for i in $( IFS=$'\n'; echo "${!a[*]}" | sort -n ); do
echo "The following \"state capital\" strings have $i tokens:"
eval "${a[$i]}"
for (( j = 0; j < ${#b[@]}; ++j )); do
echo "${b[$j]}"
done \
| column -ts '|' \
| sed -re 's/^/ /'
done
第一个循环填充关联数组 ( a
),其索引是“State Capital”中的单词数,其值是包含“State|Capital”条目的数组的字符串表示形式(使用 进行字符串化declare -p
)。
第二个循环迭代 的排序键a
,用于将的值(字符串化)eval
加载到数组中,然后迭代。a
declare -p
b
b