用于计算文本文件中不同元音数量的 Shell 脚本

用于计算文本文件中不同元音数量的 Shell 脚本

我正在编写一个脚本,该脚本应该采用给定的文本文件并计算出有多少个字符、元音以及每个元音中有多少个。第一部分很简单,但我在循环方面遇到了问题。我的理解是 myString 计数成为循环计数。每次读取一个字符时,它都会遍历 if/elif 语句,当它与元音匹配时,它会将该元音对应的变量的值增加 1。

Shellcheck.net 不喜欢我的台词,但我不明白为什么:

#!/bin/bash

myString=$(cat sampletext.txt | tr A-Z a-z)   #this works
count=$(echo -n "$myString" |tr -d '[.]'| wc -c)    #this works
vowels=$(echo -n $myString | tr -cd 'aeiou'| wc -c) #this works

va=0
ve=0
vi=0
vo=0
vu=0
i=0
while (( i++ < ${#myString} )); do
char=$(expr substr "$myString" "$i" 1)
if   [ "$char" -eq "a" ]; then
((va=++))
elif [ "$char" -eq "e" ]; then
((ve=++))
elif [ "$char" -eq "i" ]; then
((vi=++))
elif [ "$char" -eq "o" ]; then
((vo=++))
elif [ "$char" -eq "u" ]; then
((vu=++))
fi
done
echo $vi

外壳检查输出:

((va=++))
^-- SC1105 (error): Shells disambiguate (( differently or not at all. For subshell, add spaces around ( . For ((, fix parsing errors.
  ^-- SC2030 (info): Modification of va is local (to subshell caused by (..) group).

*我的错,我只放入了不起作用的部分。我已经编辑了这个以显示整个内容,包括 shebang =)

答案1

我将提供一个awk基础的解决方案,因为你提到你想要“学习/理解如何使用循环来制作一些我想要制作的报告”awk,对于这种情况通常比纯粹的更有效bash

#!/bin/sh
#
grep -o -- . "$1" |
    awk '
        /[[:alpha:]]/ { letters[tolower($1)]++ }
        /[aeiou]/ { vowels++ }

        END {
            printf "%d\tvowels\n", vowels;
            for (letter in letters) {
                printf "%d\t%s\n", letters[letter], letter | "sort -k2,3"
            }
        }
    '

调用该文件letters并使其可执行 ( chmod a+x letters)。如果输入文件是sampletext.txt你可以像这样运行它

./letters sampletext.txt

笔记

  • grep -o -- . {file}(假设 GNUgrep或兼容)将文件分割成单个字符,每行一个。我们可以在awk其内部完成此操作,但这是一种快速(且懒惰)的方法
  • [[:alpha:]]匹配一个字母字符。您可以[[:alnum:]]用于字母数字或.任何字符
  • printf | "sort"构造将其所有格式化输出输入到该命令(的单个实例)中sort,该命令又根据您当前的区域设置在第 2 列上进行排序

答案2

Bash 附带了一份 190 页的手册,其中有一个目录列表和几个索引(在附录 D 中)。 (有点晦涩和可怕的)语法隐藏了很多功能。

这里的大部分答案/教程都取决于这些部分。

www.gnu.org/software/bash/manual/bash.html#Arrays

www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion

数组没什么特别的:它们只是懒人避免为一组变量发明大量相似名称的方法。将索引值视为名称的最后部分。但由于索引本身就是一个变量,因此它是循环的绝佳工具。

这是脚本,希望有足够的注释来表明意图:

#! /bin/bash

myString="The quick brown FOX jumps over the lazy dog."

Vowel=( a e i o u )     #.. Declare a list of what we want to output. 

myString="${myString,,}"    #.. Shell substitution to lowercase a string. 

#.. Declares an associative array to store character frequencies.
#.. Typical values would be: Freq[e]="3", Freq[h]="2", Freq[q]="1".
#.. We store counts for all characters, to avoid multiple tests.

declare -A Freq 

#.. Iterate the string, indexing via a substring expansion,
#.. and counting the frequencies of each ASCII character.

for (( j = 0; j < ${#myString}; j++ )); do
    (( Freq[\${myString:j:1}]++ ))
done

declare -p Freq     #.. Debug of the frequency array.

#.. Iterate over the vowel list to report the frequencies.

for v in "${Vowel[@]}"; do
    printf 'Vowel %s occurs %2d times.\n' "${v}" "${Freq["${v}"]}"
done

这是输出:

$ time ./Calhoun.sh
declare -A Freq=([" "]="9" [.]="1" 
    [a]="1" [b]="1" [c]="1" [d]="1" 
    [e]="3" [f]="1" [g]="1" [h]="2" [i]="1" 
    [j]="1" [k]="1" [l]="1" [m]="1" [n]="1" 
    [o]="4" [p]="1" [q]="1" [r]="2" [s]="1" 
    [t]="2" [u]="2" [v]="1" [w]="1" [x]="1" 
    [y]="1" [z]="1" )
Vowel a occurs  1 times.
Vowel e occurs  3 times.
Vowel i occurs  1 times.
Vowel o occurs  4 times.
Vowel u occurs  2 times.

real    0m0.013s
user    0m0.012s
sys 0m0.000s

答案3

myString="Hello WORLD"
declare -A vowel=()      # an associative array
declare -l char          # value is lowercased upon assignment

for ((i=0; i<${#myString}; i++)); do 
    char=${myString:i:1}

    # inside [[...]], the == operator does _pattern matching_
    [[ $char == [aeiou] ]] && ((vowel[$char]++))
done

declare -p vowel   # => ([o]="2" [e]="1" )

循环字符串的字符(特别是如果字符串可能很长)的更有效方法是

while IFS= read -r -d '' -n1 char; do 
    [[ $char == [aeiou] ]] && ((vowel[$char]++))
done < <(
    printf '%s' "$myString"
)

如果您想包含计数为零的元音:

myString="Hello WORLD"
declare -A vowel=([a]=0 [e]=0 [i]=0 [o]=0 [u]=0)
declare -l char

while IFS= read -r -d '' -n1 char; do 
    [[ -v "vowel[$char]" ]] && ((vowel[$char]++))
done < <(printf '%s' "$myString")

for char in "${!vowel[@]}"; do
    printf '%s\t%d\n' "$char" "${vowel[$char]}"
done | sort
a   0
e   1
i   0
o   2
u   0```

答案4

awk每个元音单独的纯解。

#define array vowels with one vowel for each index
BEGIN{ split("aeiou",vowels,"") }

#in each line, make line lowercase
{$0=tolower($0)
#for each vowel occurence (loops through array vowels),
#replace vowel with empty string.
#gsub returns the number of replacements is has made,
#this value is added to the counter array element for each vowel
for (letter in vowels) { count[letter]+=gsub(vowels[letter],"") } }

#in the end, loop through array vowels and return vowel and counter value
END { for (letter in vowels) {print vowels[letter],count[letter]} }

将其另存为 例如count_vowels.awk并通过以下方式运行

awk -f count_vowels.awk inputfile.txt

#-lines 是注释,可以省略。

相关内容