有没有类似于 Windows 的 detex 的程序

Question 1

如果您查看 detex 的源代码（用 C 语言编写），您会发现它的主要工作是由 lex（词法分析器）在小型 sed 脚本的帮助下完成的。我检查了一下，遗憾的是 detex 未移植到 Cygwin，但我觉得您应该能够在 Cygwin 上进行编译（您有 flex（免费 lex）、gcc、gnu sed 等）。

现在，其他不那么复杂的选择是编写自己的 sed (perl) 脚本。显然，您需要在 Cygwin 中运行它。我现在正在工作，我确信我已经看到了可以很好地完成 detex 工作的 sed 单行脚本。我将尝试查找/编写这样的脚本并在此处发布。我还会尝试为这样的 sed 单行脚本发布 100 积分赏金。如果您使用 Google，您应该能够找到可以做到这一点的 Perl 脚本。

编辑：尝试使用 dvi 作为中间格式的这个脚本，然后卡特维收费以剥离 LaTeX 标签。

$ latex file.tex
$ catdvi -e 1 -U file.dvi | sed -re "s/\[U\+2022\]/*/g" \
  | sed -re "s/([^^[:space:]])\s+/\1 /g" > file.txt

我还查看了想要使用 dvi 路线的人dvi2tty将 dvi 文件转换为纯文本文件的工作做得非常出色。无需额外处理。

还有另一个著名的 sed 脚本 tex2xml，用于转换 tex2xml，由 Tilmann Bitterberg 编写。我将尝试修复它以将其转换为纯 ASCII。

#! /bin/sed -f

# Try of a nested tag{value} parser:
# - handles multiline tags
# - can deal with quoted \{ and \}
# - handles nested tags
# Limitations:
# - tags are not allowed to have [{}<>| ] in the name.
# - doesn't detect unbalanced brackets
#
# b{foo} -> <b>foo</b>
# b{foo em{bar}} -> <b>foo <em>bar</em></b>

# Tue Nov 27 17:28:32 UTC 2001

# \{1{2{3{4{5{6{7{8{9{a{b{c{d{e{f{g{h{i{\{text0\}}}}}}}}}}}}}}}}}}}text1\}

# How it works
# We build a stack of unclosed tags in holdspace
# by appending always at the end (``H'').
# when a closing bracket is found, fetch tag
# from holdspace.
# Main focus is small memory usage

# escape Quoted and generate entities
s,&,&amp;,g
s,<,&lt;,g
s,>,&gt;,g
s,\\{,&obrace;,g
s,\\},&cbrace;,g

# uninteresting line, jump to end
/[{}]/!b unescape

:open  

/{/{   
  s,\( *\)\([^|<>}{ ]*\){,\1\
\2\
,;           # Isolate tag
  # Patternspace: text \n newtag \n text
  H;         # append to holdspace
  s,\n\([^\n]*\)\n,<\1>,; # generate XML tag

  # Holdspace: ..\tagN \n text \n newtag \n text
  # We only want oldtags + newtag
  x
  s,\(.*\n\)[^\n]*\n\([^\n]*\)\n[^\n]*$,\1\2,
  x

  /^[^{]*}/b close
  /{/b open
}

:close

/}/{
  s,},\
\
\
,
  # text1 \n\n\n text2 \n\n tag0 \n tag1 text2 may be empty
  G;
  s,\n\n\n\([^\n]*\)\n.*\n\([^\n]*\)$,</\2>\1,
  x
  s,\n[^\n]*$,,;   # delete tag from holdspace
  x

  /^[^}]*{/b open;   # if next bracket is an open one
  /}/b close;        # another one?
}

:unescape
s,&obrace;,{,g
s,&cbrace;,},g

Answer

如果您查看 detex 的源代码（用 C 语言编写），您会发现它的主要工作是由 lex（词法分析器）在小型 sed 脚本的帮助下完成的。我检查了一下，遗憾的是 detex 未移植到 Cygwin，但我觉得您应该能够在 Cygwin 上进行编译（您有 flex（免费 lex）、gcc、gnu sed 等）。

现在，其他不那么复杂的选择是编写自己的 sed (perl) 脚本。显然，您需要在 Cygwin 中运行它。我现在正在工作，我确信我已经看到了可以很好地完成 detex 工作的 sed 单行脚本。我将尝试查找/编写这样的脚本并在此处发布。我还会尝试为这样的 sed 单行脚本发布 100 积分赏金。如果您使用 Google，您应该能够找到可以做到这一点的 Perl 脚本。

编辑：尝试使用 dvi 作为中间格式的这个脚本，然后卡特维收费以剥离 LaTeX 标签。

$ latex file.tex
$ catdvi -e 1 -U file.dvi | sed -re "s/\[U\+2022\]/*/g" \
  | sed -re "s/([^^[:space:]])\s+/\1 /g" > file.txt

我还查看了想要使用 dvi 路线的人dvi2tty将 dvi 文件转换为纯文本文件的工作做得非常出色。无需额外处理。

还有另一个著名的 sed 脚本 tex2xml，用于转换 tex2xml，由 Tilmann Bitterberg 编写。我将尝试修复它以将其转换为纯 ASCII。

#! /bin/sed -f

# Try of a nested tag{value} parser:
# - handles multiline tags
# - can deal with quoted \{ and \}
# - handles nested tags
# Limitations:
# - tags are not allowed to have [{}<>| ] in the name.
# - doesn't detect unbalanced brackets
#
# b{foo} -> <b>foo</b>
# b{foo em{bar}} -> <b>foo <em>bar</em></b>

# Tue Nov 27 17:28:32 UTC 2001

# \{1{2{3{4{5{6{7{8{9{a{b{c{d{e{f{g{h{i{\{text0\}}}}}}}}}}}}}}}}}}}text1\}

# How it works
# We build a stack of unclosed tags in holdspace
# by appending always at the end (``H'').
# when a closing bracket is found, fetch tag
# from holdspace.
# Main focus is small memory usage

# escape Quoted and generate entities
s,&,&amp;,g
s,<,&lt;,g
s,>,&gt;,g
s,\\{,&obrace;,g
s,\\},&cbrace;,g

# uninteresting line, jump to end
/[{}]/!b unescape

:open  

/{/{   
  s,\( *\)\([^|<>}{ ]*\){,\1\
\2\
,;           # Isolate tag
  # Patternspace: text \n newtag \n text
  H;         # append to holdspace
  s,\n\([^\n]*\)\n,<\1>,; # generate XML tag

  # Holdspace: ..\tagN \n text \n newtag \n text
  # We only want oldtags + newtag
  x
  s,\(.*\n\)[^\n]*\n\([^\n]*\)\n[^\n]*$,\1\2,
  x

  /^[^{]*}/b close
  /{/b open
}

:close

/}/{
  s,},\
\
\
,
  # text1 \n\n\n text2 \n\n tag0 \n tag1 text2 may be empty
  G;
  s,\n\n\n\([^\n]*\)\n.*\n\([^\n]*\)$,</\2>\1,
  x
  s,\n[^\n]*$,,;   # delete tag from holdspace
  x

  /^[^}]*{/b open;   # if next bracket is an open one
  /}/b close;        # another one?
}

:unescape
s,&obrace;,{,g
s,&cbrace;,},g

Question 2

LuaTeX 用户可能想看看spelling包。在运行 LaTeX 后，它会写出一个纯文本文件，可以用您最喜欢的拼写检查器进行检查。

Answer

LuaTeX 用户可能想看看spelling包。在运行 LaTeX 后，它会写出一个纯文本文件，可以用您最喜欢的拼写检查器进行检查。

Question 3

看一下开放检测。还有一个关于运行 Windows 和 detex 的人的说明。

Answer

看一下开放检测。还有一个关于运行 Windows 和 detex 的人的说明。

有没有类似于 Windows 的 detex 的程序

答案1

答案2

答案3

相关内容