在 Lua 中计算字符串的字符数时如何排除 TeX 宏?

在 Lua 中计算字符串的字符数时如何排除 TeX 宏?

我之前的问题的后续:用相同长度的随机字符串替换文本的宏

感谢 @Mico 的回答,我们现在在 Lua 中有一个宏,可以用随机字符替换 UTF-8 字符串。但是,一个问题是,当使用宏时,代码会假设字符\...{}以及\...都被计入混淆。这是有问题的,因为对于线框来说,它会导致随机字符串比普通文本更长。有没有办法获得xyz\textit{xyz}具有相同长度的随机 ASCII 输出?

MWE(感谢@Mico)如下:

% !TEX TS-program = lualatex
\documentclass{article}
\usepackage{luacode} % for 'luacode' environment and '\luastring' macro
\begin{luacode}
function rndstring ( inputstring )
  local outputstring, choices, mm, nn
  mm = unicode.utf8.len(inputstring) -- no. of utf8-encoded characters in input string

  -- Place candidate replacement characters in a Lua table:
  choices = { 
     "0","
   }--substantially simplified to reduce size      -- Number of rows in 'choices' table
   nn = #choices 
    
   -- Generate the outputstring in a 'for' loop:
   outputstring = ""
   for i = 1 , mm do
     if unicode.utf8.sub ( inputstring , i , i ) == " "  then
         outputstring = outputstring .. " " -- preserve space char.
     else -- choose a new char randomly from 'choices' table
         outputstring = outputstring .. choices[ math.random ( nn ) ]
     end
   end
 
   return ( outputstring )
end
\end{luacode}

%% Define a LaTeX macro to invoke the Lua function
\newcommand\rndstring[1]{\directlua{tex.sprint(rndstring(\luastring{#1}))}}

\begin{document}
\ttfamily
\rndstring{This is a string.}
\rnstring{\textit{This is a String}}
%%%% These two Strings should be (but aren't) the same length
\end{document}

答案1

不幸的是,人们习惯于将 TeX 输入作为常规 Lua 字符串进行处理,而当 TeX 标记发挥作用时,这种做法总会失败。

更让人难过的是,LuaTeX 实际上已经自带了一个处理 TeX 标记的内置库。这样一来,代码不仅变得更加紧凑,而且区分不同类型的标记也变得非常简单。

\documentclass{article}
\usepackage{luacode}
\begin{luacode}
local function rndstring()
    local toks = token.scan_toks()

    for n, t in ipairs(toks) do
        if t.cmdname == "letter" then
            -- random number from printable ASCII range
            local r = math.random(33, 126)
            -- create new token with that character and catcode 12
            local letter = token.create(r, 12)
            -- replace old token
            toks[n] = letter
        end
    end

    token.put_next(toks)
end

local lft = lua.get_functions_table()
lft[#lft + 1] = rndstring
token.set_lua("rndstring", #lft, "global")
\end{luacode}

\begin{document}
\ttfamily
\rndstring{This is a string.}
\rndstring{\textit{This is a String}}
\end{document}

在此处输入图片描述

答案2

我认为纯 LaTeX 解决方案更好。

\documentclass{article}
\usepackage[T1]{fontenc}


\begin{document}

\ExplSyntaxOn

% specify what candidates are in the random replacement
\def\RandomStringASCIIRanges{
  %33-47,
  %48-57,
  58-64,
  65-90,
  91-96,
  97-122,
  %123-126
}

\seq_new:N \l_chrepl_all_repl_seq
\clist_new:N \l_chrepl_tmpa_clist
\int_new:N \l_chrepl_tmpa_int
\tl_new:N \l_chrepl_tmpa_tl
\tl_new:N \g_chrepl_tmpa_tl
\tl_new:N \g_chrepl_tmpb_tl
\tl_new:N \l_chrepl_rand_charcode_tl
\tl_new:N \l_chrepl_head_tl

\cs_set:Npn \__chrepl_parse_ascii_range:w |#1-#2| {
  \int_step_inline:nnn {#1} {#2} {
    \seq_put_right:Nn \l_chrepl_all_repl_seq {##1}
  }
}

\cs_set:Npn \__chrepl_parse_ascii_range:n #1 {
  \__chrepl_parse_ascii_range:w |#1|
}

% parse the ranges
\clist_set:NV \l_chrepl_tmpa_clist \RandomStringASCIIRanges
\clist_map_function:NN \l_chrepl_tmpa_clist \__chrepl_parse_ascii_range:n

% construct an intarray for fast access
\intarray_new:Nn \g_chrepl_repl_intarray {\seq_count:N \l_chrepl_all_repl_seq}
\int_set:Nn \l_chrepl_tmpa_int {1} % loop index
\seq_map_inline:Nn \l_chrepl_all_repl_seq {
  \intarray_gset:Nnn \g_chrepl_repl_intarray {\l_chrepl_tmpa_int} {#1}
  \int_incr:N \l_chrepl_tmpa_int
}


\cs_set:Npn \__chrepl_temp_var:n #1 {
  __g_chrepl_temp_#1_tl
}

\cs_set:Npn \__chrepl_group:n #1 {
  \exp_not:n { {#1} }
}

% a recursive replacement algorithm
\cs_set:Npn \chrepl_repl:Nnn #1#2#3 {
  \group_begin:
  \tl_if_empty:nF {#2} {
    % check if head is space
    % if head is space, insert it back
    \tl_if_head_is_space:nTF {#2} {
      \tl_gput_right:Nn #1 {\ }
      % recursive call (skip spaces)
      \exp_args:Nnx \chrepl_repl:Nnn #1 {\tl_trim_spaces:n {#2}} {#3}
    } {      
      \tl_if_head_is_group:nTF {#2} {
        % the results in this group needs to be written to a unique temp variable
        % clear the temp var. corresponding to this level
        \tl_gclear:c {\__chrepl_temp_var:n {#3}}
        \chrepl_repl:cxx {\__chrepl_temp_var:n {#3}} {\tl_head:n {#2}} {\int_eval:n {#3 + 1}}
        \tl_set_eq:Nc \l_chrepl_tmpa_tl {\__chrepl_temp_var:n {#3}}
        \tl_gput_right:Nx #1 {
          \exp_args:NV  \__chrepl_group:n \l_chrepl_tmpa_tl
        }
      } {
        % extract the head
        \tl_set:Nx \l_chrepl_head_tl {\tl_head:n {#2}}
        \tl_if_empty:NF \l_chrepl_head_tl {
          % if head is control sequence, insert it back
          \exp_args:NV \token_if_cs:NTF \l_chrepl_head_tl {
            \tl_show:N \l_chrepl_head_tl
            \tl_gput_right:NV #1 \l_chrepl_head_tl
          } {
            % otherwise, do replacement
            % randomly pick a charcode from the intarray
            \tl_set:Nx \l_chrepl_rand_charcode_tl {\intarray_rand_item:N \g_chrepl_repl_intarray}
            % generate the corresponding character
            \tl_gput_right:Nx #1 {\char_generate:nn {\l_chrepl_rand_charcode_tl} {12}}
          }
        }
      }
      % recursive call
      \exp_args:Nnx \chrepl_repl:Nnn #1 {\tl_tail:n {#2}} {#3}
    }
  }
  \group_end:
}

\cs_generate_variant:Nn \chrepl_repl:Nnn {cxx}

% user function
\newcommand{\rndstr}[1]{
  \tl_gclear:N \g_chrepl_tmpa_tl % used to store results
  \chrepl_repl:Nnn \g_chrepl_tmpa_tl {#1} {1}
  \tl_show:N \g_chrepl_tmpa_tl
  \tl_use:N \g_chrepl_tmpa_tl
}

\ExplSyntaxOff


\texttt{\rndstr{Hello World}}

\texttt{\rndstr{Hello Владимир öäüß}}

\texttt{\rndstr{this \textsl{ab{\huge\bfseries cdef}gh}} nested groups.}

\texttt{\rndstr{this {ab{cdef}gh}} nested groups.}

\texttt{\rndstr{this abcdefgh nested groups.}}

\texttt{\rndstr{this \{abcdefgh\} nested groups.}}

\texttt{\rndstr{Once upon a time, there was ...}}

\end{document}

相关内容