从文件中读取新行并与两个数组进行比较，如果存在一个或多个单词，则不应写入结果

Question 1

如果badwords实际上是一个数组字，那么你可能想这样使用grep -w：

-w, --word-正则表达式

仅选择包含构成整个单词的匹配项的行。测试是匹配的子字符串必须位于行的开头，或者前面有一个非单词组成字符。同样，它必须位于行尾或后跟非单词组成字符。单词组成字符是字母、数字和下划线。如果还指定了 -x，则此选项无效。

所以在你的情况下

# Declare some constants
readonly bad_words_list="stupid dumb bad" \
         out_file="out_file" \
         in_file="in_file"


# The function you want
function filter_bad_words() {
    # Loop for reading line-by-line
    while read -r line
    do
        # Loop through the list
        # Notice that there are no quotes
        for bad_word in ${bad_words_list[@]}
        do
            # Check if there is a bad word
            # Options in grep: quiet, ignore case, word
            if grep -qiw "$bad_word" <<< "$line"
            then
                # Print the line with bad word to stderr
                echo "Line contains bad word: $line" 1>&2

                # Exit from this loop, continue the main one
                continue 2
            fi
        done

        # Save line into the out file
        # This will not be called if line contains bad word
        echo "$line" >> "$out_file"

    # Read from file
    done < "$in_file"
}

不确定这是否是最有效的解决方案（也可能使用 sed 或 awk），但至少这有效并且是纯 Bash，grep仅使用

编辑：如果你只是想过滤这些单词而不做其他处理，你也可以grep -o在这里使用：

# Read file into a variable
filtered="$(< "$in_file")"

# Go through each bad word
for word in ${bad_words_list[@]}
do
    # Filter the word
    filtered="$(grep -iv "$word" <<< "$filtered")"
done

# Save final result
echo "$filtered" > "$out_file"

Answer

如果badwords实际上是一个数组字，那么你可能想这样使用grep -w：

-w, --word-正则表达式

仅选择包含构成整个单词的匹配项的行。测试是匹配的子字符串必须位于行的开头，或者前面有一个非单词组成字符。同样，它必须位于行尾或后跟非单词组成字符。单词组成字符是字母、数字和下划线。如果还指定了 -x，则此选项无效。

所以在你的情况下

# Declare some constants
readonly bad_words_list="stupid dumb bad" \
         out_file="out_file" \
         in_file="in_file"


# The function you want
function filter_bad_words() {
    # Loop for reading line-by-line
    while read -r line
    do
        # Loop through the list
        # Notice that there are no quotes
        for bad_word in ${bad_words_list[@]}
        do
            # Check if there is a bad word
            # Options in grep: quiet, ignore case, word
            if grep -qiw "$bad_word" <<< "$line"
            then
                # Print the line with bad word to stderr
                echo "Line contains bad word: $line" 1>&2

                # Exit from this loop, continue the main one
                continue 2
            fi
        done

        # Save line into the out file
        # This will not be called if line contains bad word
        echo "$line" >> "$out_file"

    # Read from file
    done < "$in_file"
}

不确定这是否是最有效的解决方案（也可能使用 sed 或 awk），但至少这有效并且是纯 Bash，grep仅使用

编辑：如果你只是想过滤这些单词而不做其他处理，你也可以grep -o在这里使用：

# Read file into a variable
filtered="$(< "$in_file")"

# Go through each bad word
for word in ${bad_words_list[@]}
do
    # Filter the word
    filtered="$(grep -iv "$word" <<< "$filtered")"
done

# Save final result
echo "$filtered" > "$out_file"

Question 2

你把事情搞得太复杂了（而且确实不应该使用 shell 循环来处理文本）

pets='Dog
Cat
Mouse
Horse'

badword='Stupid
Dumb
Bad'

grep  -Fe "$pets"    < input.txt > pets.txt
grep -vFe "$badword" < input.txt > input-without-badword.txt

或者将两者结合起来：

grep -Fe "$pets" < input.txt |
  grep -vFe "$badword" > pets-without-badword.txt

grep接受多行作为模式（或F带有的固定字符串-F），在这种情况下，它会在输入中查找任何这些行。

如果您必须使用数组而不是多行字符串，您可以这样做：

# fish / rc / zsh -o rcexpandparam
grep -F -e$array < input > output

# zsh
grep -F -e$^array < input > output

# mksh / bash / zsh
grep -F "${array[@]/#/-e}" < input > output

# ksh93
grep -F "${array[@]/*/-e\0}" < input > output

虽然在 mksh / ksh93 / zsh / bash 中，您也可以使用换行符连接数组的元素：

IFS=$'\n'
grep -Fe "${array[*]}" < input > output

或者在 zsh 中：

grep -Fe ${(pj[\n])array} < input > output

Answer