Bash：忽略特殊字符

Question 1

您可以使用几种翻译：

tr "'"'\#$%.,:;?!&*|()[]"<>=-' ' ' <SomeFile | tr -s '[:space:]' "\n"

第一个操作将任何不需要的字符转换为空格。第二个操作将所有空白（包括换行符）转换为换行符，将换行符压缩为单个字符。

Answer

您可以使用几种翻译：

tr "'"'\#$%.,:;?!&*|()[]"<>=-' ' ' <SomeFile | tr -s '[:space:]' "\n"

第一个操作将任何不需要的字符转换为空格。第二个操作将所有空白（包括换行符）转换为换行符，将换行符压缩为单个字符。

Question 2

对于输入SomeFile：

示例：for9 开发人员>http://example.org/examples?s=%20&<what>
是，这个？

产生以下输出：

examples
for
developers
http://example.org/examples?s=%20&
what
is
this

我想这个可以tr仅使用+shell即可完成：

for i in $(<SomeFile tr -cs ']a-zA-Z0-9/:.%?=&_,+()~['\''#$;!*-' '\n' | \
    tr '[:upper:]' '[:lower:]'); do
    case "$i" in
        *://*)
            echo "$i" >> net.txt ;;
        *)
            for split in $(echo "$i" | tr -c 'a-z' '\n'); do
                echo "$split" >> net.txt
            done ;;
    esac
done

grep但添加到可能更简单tr：

< SomeFile tr -cs ']a-zA-Z0-9/:.%?=&_,+()~['\''#$;!*-' '\n' | \
    tr '[:upper:]' '[:lower:]' | grep -o '.*://.*\|[a-z]*' > net.txt

两者都不需要cat– 只需将文件定向到标准输入即可tr

格列普：

grep -oE '[a-zA-Z]+://[]a-zA-Z0-9/:.%?=&_,+()~['\''#$;!*-]+|[[:alpha:]]+' \
    -- SomeFile | tr '[:upper:]' '[:lower:]' > net.txt

zsh可以使用数组：

file=( ${(L)=$(< SomeFile)//[^]a-zA-Z0-9\/:.%?=&_,+()~[\'#$;!*-]/ } )
printf '%s\n' ${(M)file:#*://*} ${=${file:#*://*}//[^a-z]/ }

这首先打印所有网址，然后打印任何“单词”

Answer