该命令将获取一个文件并分隔每个单词，使其独占一行

Question 1

Grep 的-o选项非常适合此操作：它将每个匹配项打印在自己的行上。

grep -E -o '[[:alpha:]]{2,}' file.txt

如果你想要小写的

grep -E -o '[[:alpha:]]{2,}' file.txt | tr '[:upper:]' '[:lower:]'

请注意，grep regex 使用双括号，而 tr 没有：这是因为正则表达式中的字符类需要包含在括号中，而 tr 不使用正则表达式，它使用字符集。

Answer

Grep 的-o选项非常适合此操作：它将每个匹配项打印在自己的行上。

grep -E -o '[[:alpha:]]{2,}' file.txt

如果你想要小写的

grep -E -o '[[:alpha:]]{2,}' file.txt | tr '[:upper:]' '[:lower:]'

请注意，grep regex 使用双括号，而 tr 没有：这是因为正则表达式中的字符类需要包含在括号中，而 tr 不使用正则表达式，它使用字符集。

Question 2

尝试：

cat file.txt | tr [[:upper:]] [[:lower:]] | tr '\n' ' ' | sed -E 's/[ \t]+/\n/g'

您tr -d '\n'正在通过删除空格来连接所有单词。以上tr '\n' ' '保留间距。

然后，间距需要转换为一个空格，上面的 sed 就是这样做的。将多个空格（或制表符）转换[ \t]+为一个换行符\n。

Answer

尝试：

cat file.txt | tr [[:upper:]] [[:lower:]] | tr '\n' ' ' | sed -E 's/[ \t]+/\n/g'

您tr -d '\n'正在通过删除空格来连接所有单词。以上tr '\n' ' '保留间距。

然后，间距需要转换为一个空格，上面的 sed 就是这样做的。将多个空格（或制表符）转换[ \t]+为一个换行符\n。

Question 3

回答一下标题的问题：

该命令将获取一个文件并分隔每个单词，使其独占一行

你可以这样做：

<file tr '\n\t\r' ' '' '' ' | tr -s ' ' '\n'   # needs three spaces !

它将把换行符、制表符和回车符转换为空格，然后……
将任何连续 ( -s) 的空格转换回一新队。

您可以利用 tr 命令并使用它在同一调用中将大写字母转换为小写字母：

<file tr '[:upper:]\n\t\r' '[:lower:]   ' | tr -s ' ' '\n'

或者您可以在 GNU sed 中执行完全相同的操作（请注意，这会将整个文件放入内存中，并假设文件内不存在 NUL 字节）：

<file sed -zE -e 'y/A-Z\n\t\r/a-z   /;s/ +/\n/g'

然后，回答正文中的问题：

（单词被定义为连续的字母序列，因此 1 个字母的单词不算在内）并删除所有空白行。

您可以删除包含 az 以外的字符的单词、一个字符的单词和空行：

sed -E '/[^a-z]/d;/^.$/d;/^$/d'

它可以简化为稍微更神秘的：

sed -E '/[^a-z]/d;/^(.|)$/d'

全部都在一行中，或者：

<file tr '[:upper:]\n\t\r' '[:lower:]   ' | tr -s ' ' '\n' | sed -E '/[^a-z]/d;/^(.|)$/d'

或者：

<file sed -zE -e 'y/A-Z\n\t\r/a-z   /;s/ +/\n/g' | sed -E '/[^a-z]/d;/^(.|)$/d'

注释版本（适用于 GNU sed）：

# Source `file` and use sed with the `zero` option (-z) and Extended Regex (ERE `-E`)
<file sed -zE -e '
    # Transliterate (-y) UPPER to lower and convert control to space.
y/A-Z\n\t\r/a-z   /
    # Restore consecutive spaces to **one** newline.
s/ +/\n/g
    # Second call to sed.
' | sed -E '
    # Delete (d) lines that have nay character not in the range a-z.
/[^a-z]/d
    # delete any line with one character or empty.
/^(.|)$/d
'

Answer

回答一下标题的问题：

该命令将获取一个文件并分隔每个单词，使其独占一行

你可以这样做：

<file tr '\n\t\r' ' '' '' ' | tr -s ' ' '\n'   # needs three spaces !

它将把换行符、制表符和回车符转换为空格，然后……
将任何连续 ( -s) 的空格转换回一新队。

您可以利用 tr 命令并使用它在同一调用中将大写字母转换为小写字母：

<file tr '[:upper:]\n\t\r' '[:lower:]   ' | tr -s ' ' '\n'

或者您可以在 GNU sed 中执行完全相同的操作（请注意，这会将整个文件放入内存中，并假设文件内不存在 NUL 字节）：

<file sed -zE -e 'y/A-Z\n\t\r/a-z   /;s/ +/\n/g'

然后，回答正文中的问题：

（单词被定义为连续的字母序列，因此 1 个字母的单词不算在内）并删除所有空白行。

您可以删除包含 az 以外的字符的单词、一个字符的单词和空行：

sed -E '/[^a-z]/d;/^.$/d;/^$/d'

它可以简化为稍微更神秘的：

sed -E '/[^a-z]/d;/^(.|)$/d'

全部都在一行中，或者：

<file tr '[:upper:]\n\t\r' '[:lower:]   ' | tr -s ' ' '\n' | sed -E '/[^a-z]/d;/^(.|)$/d'

或者：

<file sed -zE -e 'y/A-Z\n\t\r/a-z   /;s/ +/\n/g' | sed -E '/[^a-z]/d;/^(.|)$/d'

注释版本（适用于 GNU sed）：

# Source `file` and use sed with the `zero` option (-z) and Extended Regex (ERE `-E`)
<file sed -zE -e '
    # Transliterate (-y) UPPER to lower and convert control to space.
y/A-Z\n\t\r/a-z   /
    # Restore consecutive spaces to **one** newline.
s/ +/\n/g
    # Second call to sed.
' | sed -E '
    # Delete (d) lines that have nay character not in the range a-z.
/[^a-z]/d
    # delete any line with one character or empty.
/^(.|)$/d
'

Question 4

$ echo '  HTE ONTE NOTEH ONTEH E E O  AOE  ' | perl -pe '$_ =~ s/\b\w\b//g; $_ =~ s/\W*(\w+)\W*/\L$1\n/g'
hte
onte
noteh
onteh
aoe

这使用 Perl 首先从输入中删除任何单字符单词，然后提取每个剩余单词并将其小写，删除非单词字符，并且每个单词在其自己的行上。

Answer

$ echo '  HTE ONTE NOTEH ONTEH E E O  AOE  ' | perl -pe '$_ =~ s/\b\w\b//g; $_ =~ s/\W*(\w+)\W*/\L$1\n/g'
hte
onte
noteh
onteh
aoe

这使用 Perl 首先从输入中删除任何单字符单词，然后提取每个剩余单词并将其小写，删除非单词字符，并且每个单词在其自己的行上。

该命令将获取一个文件并分隔每个单词，使其独占一行

答案1

答案2

答案3

答案4

相关内容