使用 texcount 计算单词和字符数(无论有无空格)

使用 texcount 计算单词和字符数(无论有无空格)

目标是使用 实现自动字符计数的灵活方案texcount。具体来说,用户应该能够选择以下选项(字符 | 单词)和(包括空格 | 不包括空格)的每个排列

选项:

  1. characters&not including whitespace
  2. words&not including whitespace

已经满足代码显示在这里. 关键行是:

\immediate\write18{texcount -char -merge -tex -sum \jobname.tex | grep -i section > \jobname Count.txt} % counts characters
%\immediate\write18{texcount -merge -tex -sum \jobname.tex | grep -i section > \jobname Count.txt} % counts words

问题:有没有一种简单的方法可以让用户选择在计数中包含空格?(动机是某些 Web 表单根据包含空格来评估计数)

建议的方法

如果我们让a = non-whitespace character countb = word count,并且我们有能力确定ab,那么就可以通过两个单独的texcount调用来执行这两个计数。当然,c = word count - 1可能是空白字符计数的近似值。然后我们应该能够通过 确定总(非空白 + 空白)字符数d = a + c

关键是要修改这些块:

\newcommand{\processCount}{%
 \newread\counts
 \def\zpar{\par}
 \openin\counts=\jobname Count.txt
 \loop
 \read\counts to \sectioncount
 \ifx\sectioncount\zpar\else
 \showcount{\sectioncount}\\
 \fi
 \ifeof\counts
 \else
 \repeat
}

\newcommand*{\showcount}[1]{%
 % e.g. 67+18+0 (1/0/0/0) S[ubs]ection: The first subsection
 \StrBehind{#1}{ection: }[\sectiontitleplusspace]
 \StrGobbleRight{\sectiontitleplusspace}{1}[\sectiontitle]
 \StrBefore{#1}{+}[\thiscount]
 \expandafter\ifcsname\sectiontitle limit\endcsname%
  \renewcommand{\limitcount}{\csname\sectiontitle limit\endcsname}%
 \else%
  \renewcommand{\limitcount}{-1}%
 \fi%
 \sectiontitle:
 {%
  \ifthenelse{\thiscount>\limitcount}{%
   \textcolor{red}{\thiscount/\limitcount}%
   \ifthenelse{\limitcount>-1}{%
    \ (over by \number\numexpr\thiscount-\limitcount\relax)%
   }{}%
  }{%
   \textcolor{green}{\thiscount/\limitcount}%
  }%
 }
}

\makeatletter
\newcommand*{\withlimit}[1]{%
 \expandafter\newcommand\csname\@currentlabelname limit\endcsname{#1}
}
\makeatother

我们还需要一种机制,让用户指示希望显示哪种计数类型/每个部分中设置的限制的隐含单位。

答案1

TeXcount 不支持对空格进行计数,至少不能以相当准确的方式进行计数。主要原因是它很难识别连续的多个空格会导致输出中出现单个空格。

解决这个缺陷的简单方法是计算单词的数量,并在假设单词由单个空格分隔的情况下,将其作为空格数量的估计值。

答案2

TeXcount 不知道如何扩展宏,例如使用词汇表中的单词\gls{}或词汇表描述来包含这些单词。

所以我猜想这种方式是没有“通用答案”的,​​除非你(大大)缩小问题的范围。

查看扩展答案https://tex.stackexchange.com/a/312299/161015由 的创造者 Einar Rødland 创作TeXcount

有无需使用 的自动字符计数替代方法TeXcount

一旦生成了 pdf,您就可以使用 ms word 或 chrome 进行计数:单词、字符(无空格)、字符(有空格)。

请参阅我的回答https://tex.stackexchange.com/a/595275/161015更多细节。

另一种方法是通过 lualatex。参见https://gist.github.com/phi-gamma/2622252

它将使用回调“pre_linebreak_filter”来计数单词,因此每个宏都已展开。我猜想代码lua可以推广到计数空格。(它需要忽略每行后要丢弃的额外空格)。

瓦

C

答案3

这个答案是我在tex 系统更新后,texcount mwe 不再起作用,这也是 OP 之前问过的。该答案不足以回答这个问题,因为它没有能力将计数原理应用于外部数据(在单独的文件中)。该答案提供了\countem...\endcountem用于计数环境中包含的字母和单词的伪环境。

为此,我添加了一个宏\countemfile,它将把\countem之前答案中开发的方法应用于外部文件中的 LaTeX 代码。该宏反过来调用\filedef获取外部文件数据并将其读入的宏\def。我不知道在这方面可能存在什么样的文件大小限制,比如消耗 TeX 引擎内存/资源等。

先前关于环境内容的答案中提供的相同计数模式现在可用于外部文件中的文档文本。但是,建议,除非您知道文件是文本和/或仅接受文本参数的宏,否则最好不要使用和功能\runningcounttrue\contentlimit因为这些功能可能会将额外的标记注入宏参数,从而破坏它们。对于具有一般文档内容的外部文件,使用\summarycountrue是更好的选择。即使不使用它,文件计数仍然会影响全局计数,用户可以自行决定访问全局计数。

与之前的解决方案一样,\countspacestrue可以将空格和 cat-12(标点和数字)的计数添加到字母计数中。此外,\obeyspaces如果扩展空格应该是计数的一部分,则可以使用。与之前一样,存在一个条件,即更改 catcodes(尤其是 verbatim)的代码无法运行,\countem因为该方法在执行之前会预览标记,从而在 verbatim 有机会设置它们之前设置它们的 catcodes。

需要注意的另一件事是,tokcycle 的“转义”功能(该功能可绕过使用分隔符进行处理)可能应该被禁用(或更改默认分隔符),因为在常规 LaTeX 使用中(例如,列格式中的)|使用此标记将被视为转义标记。可以使用 来禁用该功能,或者可以简单地将其更改为用户可以确保不会出现在输入数据中的另一个标记。|tabular\countem\settcEscapechar{\empty}\countem

最好取消注释该行

%\disablecountem% TO TURN OFF COUNTING, WITHOUT CHANGING DOCUMENT

将禁用该\countem机制,无需对源代码进行其他更改。

MWE 操作的两个文件如下所示。如您所见,输入文件可以包含节、宏等。我认为,这种方法和其他方法的一个巨大限制是,只能对呈现的标记进行计数。扩展到文本内容的标记(范围从\today\tableofcontents)可以由 处理\countem;但是,它们不会增加单词或字母计数,因为计数发生在它们扩展到最终文本之前。

\begin{filecontents*}[overwrite]{smallexternal}
This is  a    test.
\end{filecontents*}

\begin{filecontents*}[overwrite]{chapterexternal}
\section*{External-file word count}

Note that the above section heading is part of the count.
Also, one should set content-limit to 0 and runningcount false
  so that superscripts and colors don't interfere with, for example,
  macro arguments or starred argument invocation.
Use those with care, primarily with simple text files.

Macro output, like that from ``\today{}'' or ``\rule{3ex}{4pt}''
  is not part of the count, but the macro arguments are.

Unless escaped, things like environment names, column specifications,
  etc. count as letters and words:

\begin{tabular}{|c|c|}
  \hline
  aa & bbb\\
  \hline
  cccc & d\\
  \hline
\end{tabular}

\end{filecontents*}

这是 MWE

\documentclass{article}
\usepackage{tokcycle}[2021-03-10]
\usepackage{xcolor}
\newcounter{wordcount}
\newcounter{lettercount}
\newcounter{wordlimit}
\newif\ifinword
% USER PARAMETERS
\newif\ifrunningcount
\newif\ifsummarycount
\def\limitcolor{red}
\setcounter{wordlimit}{0}
%%%%%%%%%%%% ENHANCED COUNT ALGORITHM MORE STREAMLINED THAN
% https://tex.stackexchange.com/questions/577276/
% texcount-mwe-no-longer-functional-after-tex-system-update/591949#591949 
\makeatletter
\newcommand\changecolor[1]{\tctestifx{.#1}{}{\addcytoks{\color{#1}{}}%
  \tc@defx\currentcolor{#1}}}
\makeatother
\newcommand\dumpword{%
  \ifinword\stepcounter{wordcount}
    \ifrunningcount\addcytoks[x]{$^{\thewordcount,\thelettercount}$}\fi
    \ifnum\thewordcount=\value{wordlimit}\relax\changecolor{\limitcolor}\fi
  \fi%
  \inwordfalse
}
\newcommand\addletter[1]{%
  \tctestifcatnx A#1{\stepcounter{lettercount}\inwordtrue}{\dumpword}%
  \ifcountspaces\tctestifcatnx .#1{\stepcounter{lettercount}}{}\fi% NEW!!
  \addcytoks{#1}}
\newif\ifcountspaces% NEW!!
\xtokcycleenvironment\countem
  {\addletter{##1}}
  {\dumpword\groupedcytoks{\processtoks{##1}\dumpword\expandafter}\expandafter
    \changecolor\expandafter{\currentcolor}}
  {\dumpword\ifactivetok\ifnum\number`##1=32\relax
    \stepcounter{lettercount}\fi\fi\addcytoks{##1}}
  {\dumpword\ifcountspaces\stepcounter{lettercount}\fi\addcytoks{##1}}% NEW!!
  {\stripgroupingtrue
    \def\currentcolor{.}
    \setcounter{wordcount}{0}\setcounter{lettercount}{0}}
  {\dumpword\retainsum\ifsummarycount\tcafterenv{%
    \par(Wordcount=\thewordcount, Lettercount=\thelettercount)}\fi}

\newcommand\contentlimit[1]{\setcounter{wordlimit}{#1}}

\newcounter{globalwordcount}
\newcounter{globallettercount}
\newcommand\retainsum{%
  \addtocounter{globalwordcount}{\thewordcount}%
  \addtocounter{globallettercount}{\thelettercount}%
}

\newcommand\processCount{\ifnum\thegloballettercount>0%
  \par Global Wordcount=\theglobalwordcount\\
  Global Lettercount=\thegloballettercount
\fi}

\newcommand\disablecountem{\let\countem\empty\let\endcountem\empty}

%%%%%%%%%%% EXTENDED FUNCTIONALITY BASED ON BUT BEYOND
% https://tex.stackexchange.com/questions/577276/
% texcount-mwe-no-longer-functional-after-tex-system-update/591949#591949 

\makeatletter
\newcommand\xappendto[2]{\expandafter\tc@defx\expandafter
  #1\expandafter{\expandafter#1#2}}
\makeatother

\newread\srcfile
\newcommand\filedef[2]{%
  \def#2{}%
  \def\srcfileline{}%
  \openin\srcfile=#1%
  \loop\unless\ifeof\srcfile%
    \read\srcfile to\srcfileline % Reads line into \srcfileline
    \ifx\srcfileline\empty\else\xappendto#2\srcfileline\fi
  \repeat%
  \closein\srcfile%
}
\newcommand\countemfile[1]{%
  \filedef{#1}\mytextinput
  \expandafter\countem\mytextinput\endcountem
}

% EXTERNAL INPUT DATA

\begin{filecontents*}[overwrite]{smallexternal}
This is  a    test.
\end{filecontents*}

\begin{filecontents*}[overwrite]{chapterexternal}
\section*{External-file word count}

Note that the above section heading is part of the count.
Also, one should set content-limit to 0 and runningcount false
  so that superscripts and colors don't interfere with, for example,
  macro arguments or starred argument invocation.
Use those with care, primarily with simple text files.

Macro output, like that from ``\today{}'' or ``\rule{3ex}{4pt}''
  is not part of the count, but the macro arguments are.

Unless escaped, things like environment names, column specifications,
  etc. count as letters and words:

\begin{tabular}{|c|c|}
  \hline
  aa & bbb\\
  \hline
  cccc & d\\
  \hline
\end{tabular}

\end{filecontents*}

\settcEscapechar{\empty}% TURN OFF ESCAPING, SO THAT | TOKENS ARE LEFT ALONE!
\begin{document}

%\disablecountem% TO TURN OFF COUNTING, WITHOUT CHANGING DOCUMENT
\section*{Local word count}

\contentlimit{12}\runningcounttrue\countem
In this MWE, we will show both local and file counting combined in
  the global counts.
\endcountem

\contentlimit{0}\runningcountfalse\summarycounttrue
\countemfile{chapterexternal.tex}

\runningcounttrue\contentlimit{3}
\subsection*{Small file with running count, word limit}
\countemfile{smallexternal.tex}

\runningcountfalse\contentlimit{0}
\subsection*{Small file with no space counting}
\countemfile{smallexternal.tex}

\subsection*{Small file with space/punctuation counting}
\countspacestrue
\countemfile{smallexternal.tex}

\subsection*{Small file with space/punctuation counting AND obeyspaces}
\begingroup
\obeyspaces
\countemfile{smallexternal.tex}
\endgroup

\summarycountfalse\countspacesfalse

\section*{Document analysis}
\processCount

\end{document}

在此处输入图片描述 在此处输入图片描述

相关内容