bash 中的拼字游戏助手

Question 1

正则表达式不是此类工作的最佳工具。我会做类似的事情：

perl -CLASD -lne '
  BEGIN{$l0{$_}++ for (split "", shift)}
  %l = %l0; for (split "") {next LINE unless $l{$_}--}
  print' aacrt < /usr/share/dict/words

或者从那时起（至少在法语和英语中以及一些使用拉丁字母的其他语言），拼字游戏只有 26 个大写字母 A 到 Z（été 写作 ETE，cœur 写作 COEUR），而 GNU iconv：

iconv -t us//TRANSLIT < /usr/share/dict/words |
  perl -CLASD -lne '
    BEGIN{$l0{$_}++ for (split "", uc shift)}
    %l = %l0; for (split "", uc $_) {next LINE unless $l{$_}--}
    print' croeu

或者以原始形式输出：

perl -CLASD -MText::Unidecode -lne '
  BEGIN{$l0{$_}++ for (split "", uc shift)}
  %l = %l0; for (split "", uc unidecode $_) {next LINE unless $l{$_}--}
  print' croeu < /usr/share/dict/word

Answer

正则表达式不是此类工作的最佳工具。我会做类似的事情：

perl -CLASD -lne '
  BEGIN{$l0{$_}++ for (split "", shift)}
  %l = %l0; for (split "") {next LINE unless $l{$_}--}
  print' aacrt < /usr/share/dict/words

或者从那时起（至少在法语和英语中以及一些使用拉丁字母的其他语言），拼字游戏只有 26 个大写字母 A 到 Z（été 写作 ETE，cœur 写作 COEUR），而 GNU iconv：

iconv -t us//TRANSLIT < /usr/share/dict/words |
  perl -CLASD -lne '
    BEGIN{$l0{$_}++ for (split "", uc shift)}
    %l = %l0; for (split "", uc $_) {next LINE unless $l{$_}--}
    print' croeu

或者以原始形式输出：

perl -CLASD -MText::Unidecode -lne '
  BEGIN{$l0{$_}++ for (split "", uc shift)}
  %l = %l0; for (split "", uc unidecode $_) {next LINE unless $l{$_}--}
  print' croeu < /usr/share/dict/word

Question 2

这里发生的事情是由 {a,c,r,t}{a,c,r,t}{a,c,r,t}{a,c,r,t}您正在使用的 shell 扩展的。这意味着第一个 ( ) 是将搜索、等aaaa的模式，就像您键入：grepaaacaaar

grep aaaa aaac aaar aaat aaca ..... /usr/share/dict/words

将搜索模式放在单引号中以防止发生这种情况：

grep '{a,c,r,t}{a,c,r,t}{a,c,r,t}{a,c,r,t}' /usr/share/dict/words

另一方面，我不确定您是否在这里使用正确的 grep 语法。我会用：

grep '[acrt][acrt][acrt][actr]' /usr/share/dict/words

它匹配 4 个字符的组合，正如 @mueh 评论的那样：

grep -xE '[acrt]{1,4}' /usr/share/dict/words

匹配这些字母的 1-4 个组合。

Answer

这里发生的事情是由 {a,c,r,t}{a,c,r,t}{a,c,r,t}{a,c,r,t}您正在使用的 shell 扩展的。这意味着第一个 ( ) 是将搜索、等aaaa的模式，就像您键入：grepaaacaaar

grep aaaa aaac aaar aaat aaca ..... /usr/share/dict/words

将搜索模式放在单引号中以防止发生这种情况：

grep '{a,c,r,t}{a,c,r,t}{a,c,r,t}{a,c,r,t}' /usr/share/dict/words

另一方面，我不确定您是否在这里使用正确的 grep 语法。我会用：

grep '[acrt][acrt][acrt][actr]' /usr/share/dict/words

它匹配 4 个字符的组合，正如 @mueh 评论的那样：

grep -xE '[acrt]{1,4}' /usr/share/dict/words

匹配这些字母的 1-4 个组合。

Question 3

大括号的 Bash 扩展不会生成有效的排列，因为该集合包含具有相同字符重复和某些缺失字符的项目。

您需要的是一些将使用全部或部分字符的字谜工具。幸运的是，这样的工具已经作为标准 Linux 的一部分存在。它被称为一个。它用/ usr /共享/字典/单词作为其默认字典。

以下是如何使用它的示例。

首先定义这个函数（交互式即可）：

$ mywords() { an -w "$1" -m 4 | awk '/^[a-z]*$/ {print length($0), $0}' | column; }

现在假设您有这些字母 ypltar。要查找全部或部分使用的有效字典单词：

$ mywords ypltar
6 partly        5 party         4 tray          4 tarp          4 pray          4 part
6 paltry        5 aptly         4 trap          4 rapt          4 play          4 arty

我在函数中使用了 -m 4 将输出单词限制为不少于 4 个字母。您可以根据需要更改它。 awk 位用于排除带有大写字母的字典条目（专有名称等）。

Answer