禁用 man / nroff 对引号和破折号的混淆

2024-6-11 • tag-icon

任务描述语言：

如何禁用'和-内部的魔术解析foo-bar：

❯ echo "\'hello\' --foo-bar" | nroff
´hello´ ‐‐foo‐bar
^     ^      ^ are not the original input

长的：

我的页面在以 UTF-8man呈现时会破坏破折号和引号：us-ascii

示例如下：（ripgrep）rg：

❯ man rg | grep -- ', --after'
       -A NUM, --after‐context=NUM
#                     ^ this is not U+002D, it is U+2010

❯ man rg | grep -- ', --after-'
#                            ^ when I add the dash it doesn't find anything as they are not the same

但是，在手册页本身中搜索时，它只是一个普通的 ( U+002D) 破折号：

❯ cat rg.1 | grep -- 'after-context'
\fB\-A\fP \fINUM\fP, \fB\-\-after-context\fP=\fINUM\fP

这完全打断了我在 man 中的搜索，因为输入的内容/after-<enter>没有任何结果。我的键盘没有按钮U+2010，我也不必关心软连字符 ( U+00AD)、修饰字母减号U+02D7( )、连字符 ( U+2010)、不间断连字符 ( U+2011)、数字破折号 ( ) 、短划线 ( U+2012)、连字符U+2013点 ( U+2027)、连字符项目符号 ( U+2043)、减号 ( U+2212) 或罗马连字符 ( ) 之间的微妙区别，或者选择渲染字符的U+10191任何方式。man

同样适用于引号。出于某种原因，man将普通引号更改为打开和关闭引号：

❯ man rg | grep -- 'path:none'
               --colors ’path:none’ \
               --colors ’path:none’ \
#                       |         ^ not U+0027 but U+2019
#                       ^ not U+0027 but U+2019

这个实际上更糟糕，因为它将两者都呈现为右单引号。

再次，源使用正常的U+0027。

❯ cat rg.1 | grep -- 'path:none'
    \-\-colors 'path:none' \\
    \-\-colors 'path:none' \\

我怎样才能禁用此行为？

当页面本身包含 UTF-8 字符时，使用nroff -Tlatin1不起作用。man

版本和语言环境：

❯ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.10
DISTRIB_CODENAME=mantic
DISTRIB_DESCRIPTION="Ubuntu 23.10"

❯ locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=

答案1

nroff 会以您不希望的方式解释这些字符。考虑使用 sed 进行预处理步骤，在将文本传递给 nroff 之前替换或删除特殊字符。

你可以尝试这个：

echo "\'hello\' --foo-bar" | sed 's/--/\\&/g' | nroff

此 sed 命令在将文本传递给 nroff 之前将 -- 替换为 --，这可能有助于防止 nroff 将其解释为特殊序列。

答案1

相关内容