我有这个 bash 脚本
#!/bin/bash
cat $@ | while read line
do
for word in $line
do
echo $word | circling-the-square
# here's where i need to add the if statement:
#if the word contains one of the four [!?.,],
#then also echo that punctuation mark
done
done
circling-the-square 是一个基于 Norvig 的 Python 脚本拼写纠正器。
该脚本消除了标点符号的输入
def words(text): return re.findall('[a-z]+', text.lower())
所以我需要bash
注意这一点。我想sed
orawk
可能有用,但我仍然不知道如何编写该正则表达式或将其放入 if 语句中,所以我在这里问这个。
按原样传递文件
alec@ROOROO:~/oddi-o/newton-fluxions$ cat 199
advertisement lately publijtid by the author, the british hemisphere, or a map of a new contrivance, proper for initiating young minds in the firft rudiments of geography, and the ufe of the globes.
给出
alec@ROOROO:~/oddi-o/newton-fluxions$ ./hmmb 199
advertisement
lately
publijtid
by
the
author
the
british
hemisphere
or
a
map
of
a
new
contrivance
proper
for
initiating
young
minds
in
the
first
rudiments
of
geography
and
the
few
of
the
globes.
这并不完美,但仍然有用。供参考,我已编辑相关文件以仅包含\w
和 标点符号[!?.,]
。该文件不包含像 : 或 ; 这样的字符,所以我只需要它来呼应这四个标点符号如果它们作为单词的一部分包含在内,即:
alec@ROOROO:~/oddi-o/newton-fluxions/finforno$ ./hmmb 199
advertisement
lately
publijtid
by
the
author,
the
british
hemisphere,
or
a
map
of
a
new
contrivance,
proper
for
initiating
young
minds
in
the
firft
rudiments
of
geography,
and
the
ufe
of
the
globes.
答案1
使用正则表达式,如下所示。它查找包含一个或多个指定标点符号的单词,并打印出该单词和第一个匹配的标点符号。您可以根据需要扩展它。
if [[ "$word" =~ ^.*([!?.,])+.*$ ]]
then
echo "Found word: $word containing punctuation mark: ${BASH_REMATCH[1]}"
fi
答案2
听起来 bash 正则表达式可能会有所帮助。关于该主题的 Stackoverflow 讨论:https://stackoverflow.com/questions/304864/how-do-i-use-regular-expressions-in-bash-scripts