用于计算文本文件中不同元音数量的 Shell 脚本

Question 1

我将提供一个awk基础的解决方案，因为你提到你想要“学习/理解如何使用循环来制作一些我想要制作的报告”awk，对于这种情况通常比纯粹的更有效bash

#!/bin/sh
#
grep -o -- . "$1" |
    awk '
        /[[:alpha:]]/ { letters[tolower($1)]++ }
        /[aeiou]/ { vowels++ }

        END {
            printf "%d\tvowels\n", vowels;
            for (letter in letters) {
                printf "%d\t%s\n", letters[letter], letter | "sort -k2,3"
            }
        }
    '

调用该文件letters并使其可执行 ( chmod a+x letters)。如果输入文件是sampletext.txt你可以像这样运行它

./letters sampletext.txt

笔记

grep -o -- . {file}（假设 GNUgrep或兼容）将文件分割成单个字符，每行一个。我们可以在awk其内部完成此操作，但这是一种快速（且懒惰）的方法
[[:alpha:]]匹配一个字母字符。您可以[[:alnum:]]用于字母数字或.任何字符
该printf | "sort"构造将其所有格式化输出输入到该命令（的单个实例）中sort，该命令又根据您当前的区域设置在第 2 列上进行排序

Answer

我将提供一个awk基础的解决方案，因为你提到你想要“学习/理解如何使用循环来制作一些我想要制作的报告”awk，对于这种情况通常比纯粹的更有效bash

#!/bin/sh
#
grep -o -- . "$1" |
    awk '
        /[[:alpha:]]/ { letters[tolower($1)]++ }
        /[aeiou]/ { vowels++ }

        END {
            printf "%d\tvowels\n", vowels;
            for (letter in letters) {
                printf "%d\t%s\n", letters[letter], letter | "sort -k2,3"
            }
        }
    '

调用该文件letters并使其可执行 ( chmod a+x letters)。如果输入文件是sampletext.txt你可以像这样运行它

./letters sampletext.txt

笔记

grep -o -- . {file}（假设 GNUgrep或兼容）将文件分割成单个字符，每行一个。我们可以在awk其内部完成此操作，但这是一种快速（且懒惰）的方法
[[:alpha:]]匹配一个字母字符。您可以[[:alnum:]]用于字母数字或.任何字符
该printf | "sort"构造将其所有格式化输出输入到该命令（的单个实例）中sort，该命令又根据您当前的区域设置在第 2 列上进行排序

Question 2

Bash 附带了一份 190 页的手册，其中有一个目录列表和几个索引（在附录 D 中）。（有点晦涩和可怕的）语法隐藏了很多功能。

这里的大部分答案/教程都取决于这些部分。

www.gnu.org/software/bash/manual/bash.html#Arrays

www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion

数组没什么特别的：它们只是懒人避免为一组变量发明大量相似名称的方法。将索引值视为名称的最后部分。但由于索引本身就是一个变量，因此它是循环的绝佳工具。

这是脚本，希望有足够的注释来表明意图：

#! /bin/bash

myString="The quick brown FOX jumps over the lazy dog."

Vowel=( a e i o u )     #.. Declare a list of what we want to output. 

myString="${myString,,}"    #.. Shell substitution to lowercase a string. 

#.. Declares an associative array to store character frequencies.
#.. Typical values would be: Freq[e]="3", Freq[h]="2", Freq[q]="1".
#.. We store counts for all characters, to avoid multiple tests.

declare -A Freq 

#.. Iterate the string, indexing via a substring expansion,
#.. and counting the frequencies of each ASCII character.

for (( j = 0; j < ${#myString}; j++ )); do
    (( Freq[\${myString:j:1}]++ ))
done

declare -p Freq     #.. Debug of the frequency array.

#.. Iterate over the vowel list to report the frequencies.

for v in "${Vowel[@]}"; do
    printf 'Vowel %s occurs %2d times.\n' "${v}" "${Freq["${v}"]}"
done

这是输出：

$ time ./Calhoun.sh
declare -A Freq=([" "]="9" [.]="1" 
    [a]="1" [b]="1" [c]="1" [d]="1" 
    [e]="3" [f]="1" [g]="1" [h]="2" [i]="1" 
    [j]="1" [k]="1" [l]="1" [m]="1" [n]="1" 
    [o]="4" [p]="1" [q]="1" [r]="2" [s]="1" 
    [t]="2" [u]="2" [v]="1" [w]="1" [x]="1" 
    [y]="1" [z]="1" )
Vowel a occurs  1 times.
Vowel e occurs  3 times.
Vowel i occurs  1 times.
Vowel o occurs  4 times.
Vowel u occurs  2 times.

real    0m0.013s
user    0m0.012s
sys 0m0.000s

Answer

Bash 附带了一份 190 页的手册，其中有一个目录列表和几个索引（在附录 D 中）。（有点晦涩和可怕的）语法隐藏了很多功能。

这里的大部分答案/教程都取决于这些部分。

www.gnu.org/software/bash/manual/bash.html#Arrays

www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion

数组没什么特别的：它们只是懒人避免为一组变量发明大量相似名称的方法。将索引值视为名称的最后部分。但由于索引本身就是一个变量，因此它是循环的绝佳工具。

这是脚本，希望有足够的注释来表明意图：

#! /bin/bash

myString="The quick brown FOX jumps over the lazy dog."

Vowel=( a e i o u )     #.. Declare a list of what we want to output. 

myString="${myString,,}"    #.. Shell substitution to lowercase a string. 

#.. Declares an associative array to store character frequencies.
#.. Typical values would be: Freq[e]="3", Freq[h]="2", Freq[q]="1".
#.. We store counts for all characters, to avoid multiple tests.

declare -A Freq 

#.. Iterate the string, indexing via a substring expansion,
#.. and counting the frequencies of each ASCII character.

for (( j = 0; j < ${#myString}; j++ )); do
    (( Freq[\${myString:j:1}]++ ))
done

declare -p Freq     #.. Debug of the frequency array.

#.. Iterate over the vowel list to report the frequencies.

for v in "${Vowel[@]}"; do
    printf 'Vowel %s occurs %2d times.\n' "${v}" "${Freq["${v}"]}"
done

这是输出：

$ time ./Calhoun.sh
declare -A Freq=([" "]="9" [.]="1" 
    [a]="1" [b]="1" [c]="1" [d]="1" 
    [e]="3" [f]="1" [g]="1" [h]="2" [i]="1" 
    [j]="1" [k]="1" [l]="1" [m]="1" [n]="1" 
    [o]="4" [p]="1" [q]="1" [r]="2" [s]="1" 
    [t]="2" [u]="2" [v]="1" [w]="1" [x]="1" 
    [y]="1" [z]="1" )
Vowel a occurs  1 times.
Vowel e occurs  3 times.
Vowel i occurs  1 times.
Vowel o occurs  4 times.
Vowel u occurs  2 times.

real    0m0.013s
user    0m0.012s
sys 0m0.000s

Question 3

myString="Hello WORLD"
declare -A vowel=()      # an associative array
declare -l char          # value is lowercased upon assignment

for ((i=0; i<${#myString}; i++)); do 
    char=${myString:i:1}

    # inside [[...]], the == operator does _pattern matching_
    [[ $char == [aeiou] ]] && ((vowel[$char]++))
done

declare -p vowel   # => ([o]="2" [e]="1" )

循环字符串的字符（特别是如果字符串可能很长）的更有效方法是

while IFS= read -r -d '' -n1 char; do 
    [[ $char == [aeiou] ]] && ((vowel[$char]++))
done < <(
    printf '%s' "$myString"
)

如果您想包含计数为零的元音：

myString="Hello WORLD"
declare -A vowel=([a]=0 [e]=0 [i]=0 [o]=0 [u]=0)
declare -l char

while IFS= read -r -d '' -n1 char; do 
    [[ -v "vowel[$char]" ]] && ((vowel[$char]++))
done < <(printf '%s' "$myString")

for char in "${!vowel[@]}"; do
    printf '%s\t%d\n' "$char" "${vowel[$char]}"
done | sort

a   0
e   1
i   0
o   2
u   0```

Answer

myString="Hello WORLD"
declare -A vowel=()      # an associative array
declare -l char          # value is lowercased upon assignment

for ((i=0; i<${#myString}; i++)); do 
    char=${myString:i:1}

    # inside [[...]], the == operator does _pattern matching_
    [[ $char == [aeiou] ]] && ((vowel[$char]++))
done

declare -p vowel   # => ([o]="2" [e]="1" )

循环字符串的字符（特别是如果字符串可能很长）的更有效方法是

while IFS= read -r -d '' -n1 char; do 
    [[ $char == [aeiou] ]] && ((vowel[$char]++))
done < <(
    printf '%s' "$myString"
)

如果您想包含计数为零的元音：

myString="Hello WORLD"
declare -A vowel=([a]=0 [e]=0 [i]=0 [o]=0 [u]=0)
declare -l char

while IFS= read -r -d '' -n1 char; do 
    [[ -v "vowel[$char]" ]] && ((vowel[$char]++))
done < <(printf '%s' "$myString")

for char in "${!vowel[@]}"; do
    printf '%s\t%d\n' "$char" "${vowel[$char]}"
done | sort

a   0
e   1
i   0
o   2
u   0```

Question 4

awk每个元音单独的纯解。

#define array vowels with one vowel for each index
BEGIN{ split("aeiou",vowels,"") }

#in each line, make line lowercase
{$0=tolower($0)
#for each vowel occurence (loops through array vowels),
#replace vowel with empty string.
#gsub returns the number of replacements is has made,
#this value is added to the counter array element for each vowel
for (letter in vowels) { count[letter]+=gsub(vowels[letter],"") } }

#in the end, loop through array vowels and return vowel and counter value
END { for (letter in vowels) {print vowels[letter],count[letter]} }

将其另存为例如count_vowels.awk并通过以下方式运行

awk -f count_vowels.awk inputfile.txt

#-lines 是注释，可以省略。

Answer

awk每个元音单独的纯解。

#define array vowels with one vowel for each index
BEGIN{ split("aeiou",vowels,"") }

#in each line, make line lowercase
{$0=tolower($0)
#for each vowel occurence (loops through array vowels),
#replace vowel with empty string.
#gsub returns the number of replacements is has made,
#this value is added to the counter array element for each vowel
for (letter in vowels) { count[letter]+=gsub(vowels[letter],"") } }

#in the end, loop through array vowels and return vowel and counter value
END { for (letter in vowels) {print vowels[letter],count[letter]} }

将其另存为例如count_vowels.awk并通过以下方式运行

awk -f count_vowels.awk inputfile.txt

#-lines 是注释，可以省略。

用于计算文本文件中不同元音数量的 Shell 脚本

答案1

答案2

答案3

答案4

相关内容