如何使用 Minted 避免代码块分裂到两个页面

Question 1

手动断页以防止函数中出现分页符

\FancyVerbGetLine通过更改中的内部函数，可以实现您想要的功能fancyvrb。代码附在下面。

分页符可以由 Python 中包含文本的行触发#PB####。我之所以选择此序列，是因为它将在 Python 中被解析为注释，并且不会干扰正常执行。示例如图所示test.py。

我尝试在评论中解释其工作原理。如果您还有其他问题，请告诉我。

LaTeX 代码

\documentclass{article}
\usepackage{minted}
\usepackage[a4paper, portrait, left=1.5cm, right=1.5cm, top=20mm, bottom=20mm]{geometry}
\usepackage{tcolorbox}


\makeatletter

% the pygments output for #PB#### is stored here (we use \detokenize{} to convert it to "string")
\edef\fancyvrb@pb@match{\detokenize{\PYG{c+c1}{\PYGZsh{}PB\PYGZsh{}\PYGZsh{}\PYGZsh{}\PYGZsh{}}}}

\begingroup
\catcode`\^^M=\active%
\gdef\FancyVerbGetLine#1^^M{%
  \@nil%
  \FV@CheckEnd{#1}%
  \ifx\@tempa\FV@EnvironName%            % True if end is found
    \ifx\@tempb\FV@@@CheckEnd\else\FV@BadEndError\fi%
    \let\next\FV@EndScanning%
  \else%
    \def\FV@Line{#1}%
    \def\next{\FV@PreProcessLine\FV@GetLine}%
    \edef\fancyvrb@cur@line@detok{\detokenize{#1}}% convert the current fancyvrb line to string
    \ifx\fancyvrb@pb@match\fancyvrb@cur@line@detok\newpage\def\FV@Line{}\fi% if the current fancyvrb line is equal to \fancyvrb@pb@match, insert a page break and empty the line
  \fi%
  \next}%
\endgroup

\makeatother


\begin{document}

\inputminted[linenos]{python}{test.py}


\end{document}

测试.py

class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

  def myfunc(self):
    print("Hello my name is " + self.name)

p1 = Person("John", 36)

p1.age = 40

print(p1.age)


#PB####

class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

  def myfunc(self):
    print("Hello my name is " + self.name)

p1 = Person("John", 36)

p1.age = 40

print(p1.age)

自动方法

自动执行此操作的一种方法是利用 Python 的ast模块获取抽象语法树（AST），并使用 AST 区分函数行和其他行。

源代码如下，调用Python脚本，传入两个参数：python3 convert_to_tex.py input_py_file output_tex_file，建议使用Python 3.8+版本。

# convert_to_tex.py
import ast
import sys
from dataclasses import dataclass

in_fn = sys.argv[1]
out_fn = sys.argv[2]

with open(in_fn) as f:
    src = f.read()

src_lines = src.split('\n')

tree = ast.parse(src)

func_line_nos = []
function_flag = [0] * len(src_lines) # used to avoid repeating segments in nested functions

for node in ast.walk(tree):
    if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
        if not function_flag[node.lineno-1]:
            func_line_nos.append((node.lineno-1, node.end_lineno-1))
            for i in range(node.lineno-1, node.end_lineno):
                function_flag[i] = 1

@dataclass
class NonFuncSeg:
    data:str
    start: int

@dataclass
class FuncSeg:
    data:str
    start:int

file_seg = []

last_pos = 0
for start, end in func_line_nos:
    if last_pos < start:
        file_seg.append(NonFuncSeg(src_lines[last_pos:start], last_pos))
    file_seg.append(FuncSeg(src_lines[start:end+1], start))
    last_pos = end+1

if last_pos != len(src_lines):
    file_seg.append(NonFuncSeg(src_lines[last_pos:], last_pos))

assert sum(len(x.data) for x in file_seg) == len(src_lines) # make sure we do not miss any lines

with open(out_fn, 'w') as outfile:
    for item in file_seg:
        if isinstance(item, NonFuncSeg):
            outfile.write(
(r'''
\begin{CodeCanBreak}{%d}
%s
\end{CodeCanBreak}
''' % (item.start + 1, '\n'.join(item.data))).lstrip()
            ) 
        else:
            outfile.write(
(r'''
\begin{CodeNoBreak}{%d}
%s
\end{CodeNoBreak}
''' % (item.start + 1, '\n'.join(item.data))).lstrip()
            )

此脚本会将 Python 文件分为函数段和非函数段。这两类段分别被放入两个 LaTeX 列表环境CodeCanBreak和CodeNoBreak中。从名称就可以看出这两个列表环境的用途。输出示例如下所示。请注意，传递给列表环境的数字用于恢复正确的行号。

\begin{CodeCanBreak}{1}
#! /usr/bin/env python3

"""Base16, Base32, Base64 (RFC 3548), Base85 and Ascii85 data encodings"""

# Modified 04-Oct-1995 by Jack Jansen to use binascii module
# Modified 30-Dec-2003 by Barry Warsaw to add full RFC 3548 support
# Modified 22-May-2007 by Guido van Rossum to use bytes everywhere

import re
import struct
import binascii


__all__ = [
    # Legacy interface exports traditional RFC 2045 Base64 encodings
    'encode', 'decode', 'encodebytes', 'decodebytes',
    # Generalized interface for other encodings
    'b64encode', 'b64decode', 'b32encode', 'b32decode',
    'b32hexencode', 'b32hexdecode', 'b16encode', 'b16decode',
    # Base85 and Ascii85 encodings
    'b85encode', 'b85decode', 'a85encode', 'a85decode',
    # Standard Base64 encoding
    'standard_b64encode', 'standard_b64decode',
    # Some common Base64 alternatives.  As referenced by RFC 3458, see thread
    # starting at:
    #
    # http://zgp.org/pipermail/p2p-hackers/2001-September/000316.html
    'urlsafe_b64encode', 'urlsafe_b64decode',
    ]


bytes_types = (bytes, bytearray)  # Types acceptable as binary data

\end{CodeCanBreak}
\begin{CodeNoBreak}{34}
def _bytes_from_decode_data(s):
    if isinstance(s, str):
        try:
            return s.encode('ascii')
        except UnicodeEncodeError:
            raise ValueError('string argument should contain only ASCII characters')
    if isinstance(s, bytes_types):
        return s
    try:
        return memoryview(s).tobytes()
    except TypeError:
        raise TypeError("argument should be a bytes-like object or ASCII "
                        "string, not %r" % s.__class__.__name__) from None
\end{CodeNoBreak}
\begin{CodeCanBreak}{47}

您可以将此生成的 LaTeX 代码与\input适当的环境定义一起使用：

\documentclass{article}
\usepackage{minted}
\usepackage[a4paper, portrait, left=1.5cm, right=1.5cm, top=20mm, bottom=20mm]{geometry}
\usepackage{tcolorbox}
\tcbuselibrary{skins,minted,breakable}


\begin{document}



\tcbset{
  listingbase/.style n args={1}{
    enhanced,
    boxrule=0pt,
    top=0pt,
    bottom=0.4\baselineskip, % adjust accordingly
    left=0pt,
    right=0pt,
    colback=white,
    boxsep=0pt,
    nobeforeafter,
    before={\par\noindent},
    frame hidden,
    listing only, 
    listing engine=minted, 
    minted language=python,
    minted options={
        linenos,
        numbersep=0.5em,
        firstnumber={#1}
    }
  }
}

\newtcblisting{CodeCanBreak}[1]{listingbase=#1,breakable,}
\newtcblisting{CodeNoBreak}[1]{listingbase=#1}

\bgroup
\input{example.tex}
\egroup

\end{document}

Answer

手动断页以防止函数中出现分页符

\FancyVerbGetLine通过更改中的内部函数，可以实现您想要的功能fancyvrb。代码附在下面。

分页符可以由 Python 中包含文本的行触发#PB####。我之所以选择此序列，是因为它将在 Python 中被解析为注释，并且不会干扰正常执行。示例如图所示test.py。

我尝试在评论中解释其工作原理。如果您还有其他问题，请告诉我。

LaTeX 代码

\documentclass{article}
\usepackage{minted}
\usepackage[a4paper, portrait, left=1.5cm, right=1.5cm, top=20mm, bottom=20mm]{geometry}
\usepackage{tcolorbox}


\makeatletter

% the pygments output for #PB#### is stored here (we use \detokenize{} to convert it to "string")
\edef\fancyvrb@pb@match{\detokenize{\PYG{c+c1}{\PYGZsh{}PB\PYGZsh{}\PYGZsh{}\PYGZsh{}\PYGZsh{}}}}

\begingroup
\catcode`\^^M=\active%
\gdef\FancyVerbGetLine#1^^M{%
  \@nil%
  \FV@CheckEnd{#1}%
  \ifx\@tempa\FV@EnvironName%            % True if end is found
    \ifx\@tempb\FV@@@CheckEnd\else\FV@BadEndError\fi%
    \let\next\FV@EndScanning%
  \else%
    \def\FV@Line{#1}%
    \def\next{\FV@PreProcessLine\FV@GetLine}%
    \edef\fancyvrb@cur@line@detok{\detokenize{#1}}% convert the current fancyvrb line to string
    \ifx\fancyvrb@pb@match\fancyvrb@cur@line@detok\newpage\def\FV@Line{}\fi% if the current fancyvrb line is equal to \fancyvrb@pb@match, insert a page break and empty the line
  \fi%
  \next}%
\endgroup

\makeatother


\begin{document}

\inputminted[linenos]{python}{test.py}


\end{document}

测试.py

class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

  def myfunc(self):
    print("Hello my name is " + self.name)

p1 = Person("John", 36)

p1.age = 40

print(p1.age)


#PB####

class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

  def myfunc(self):
    print("Hello my name is " + self.name)

p1 = Person("John", 36)

p1.age = 40

print(p1.age)

自动方法

自动执行此操作的一种方法是利用 Python 的ast模块获取抽象语法树（AST），并使用 AST 区分函数行和其他行。

源代码如下，调用Python脚本，传入两个参数：python3 convert_to_tex.py input_py_file output_tex_file，建议使用Python 3.8+版本。

# convert_to_tex.py
import ast
import sys
from dataclasses import dataclass

in_fn = sys.argv[1]
out_fn = sys.argv[2]

with open(in_fn) as f:
    src = f.read()

src_lines = src.split('\n')

tree = ast.parse(src)

func_line_nos = []
function_flag = [0] * len(src_lines) # used to avoid repeating segments in nested functions

for node in ast.walk(tree):
    if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
        if not function_flag[node.lineno-1]:
            func_line_nos.append((node.lineno-1, node.end_lineno-1))
            for i in range(node.lineno-1, node.end_lineno):
                function_flag[i] = 1

@dataclass
class NonFuncSeg:
    data:str
    start: int

@dataclass
class FuncSeg:
    data:str
    start:int

file_seg = []

last_pos = 0
for start, end in func_line_nos:
    if last_pos < start:
        file_seg.append(NonFuncSeg(src_lines[last_pos:start], last_pos))
    file_seg.append(FuncSeg(src_lines[start:end+1], start))
    last_pos = end+1

if last_pos != len(src_lines):
    file_seg.append(NonFuncSeg(src_lines[last_pos:], last_pos))

assert sum(len(x.data) for x in file_seg) == len(src_lines) # make sure we do not miss any lines

with open(out_fn, 'w') as outfile:
    for item in file_seg:
        if isinstance(item, NonFuncSeg):
            outfile.write(
(r'''
\begin{CodeCanBreak}{%d}
%s
\end{CodeCanBreak}
''' % (item.start + 1, '\n'.join(item.data))).lstrip()
            ) 
        else:
            outfile.write(
(r'''
\begin{CodeNoBreak}{%d}
%s
\end{CodeNoBreak}
''' % (item.start + 1, '\n'.join(item.data))).lstrip()
            )

此脚本会将 Python 文件分为函数段和非函数段。这两类段分别被放入两个 LaTeX 列表环境CodeCanBreak和CodeNoBreak中。从名称就可以看出这两个列表环境的用途。输出示例如下所示。请注意，传递给列表环境的数字用于恢复正确的行号。

\begin{CodeCanBreak}{1}
#! /usr/bin/env python3

"""Base16, Base32, Base64 (RFC 3548), Base85 and Ascii85 data encodings"""

# Modified 04-Oct-1995 by Jack Jansen to use binascii module
# Modified 30-Dec-2003 by Barry Warsaw to add full RFC 3548 support
# Modified 22-May-2007 by Guido van Rossum to use bytes everywhere

import re
import struct
import binascii


__all__ = [
    # Legacy interface exports traditional RFC 2045 Base64 encodings
    'encode', 'decode', 'encodebytes', 'decodebytes',
    # Generalized interface for other encodings
    'b64encode', 'b64decode', 'b32encode', 'b32decode',
    'b32hexencode', 'b32hexdecode', 'b16encode', 'b16decode',
    # Base85 and Ascii85 encodings
    'b85encode', 'b85decode', 'a85encode', 'a85decode',
    # Standard Base64 encoding
    'standard_b64encode', 'standard_b64decode',
    # Some common Base64 alternatives.  As referenced by RFC 3458, see thread
    # starting at:
    #
    # http://zgp.org/pipermail/p2p-hackers/2001-September/000316.html
    'urlsafe_b64encode', 'urlsafe_b64decode',
    ]


bytes_types = (bytes, bytearray)  # Types acceptable as binary data

\end{CodeCanBreak}
\begin{CodeNoBreak}{34}
def _bytes_from_decode_data(s):
    if isinstance(s, str):
        try:
            return s.encode('ascii')
        except UnicodeEncodeError:
            raise ValueError('string argument should contain only ASCII characters')
    if isinstance(s, bytes_types):
        return s
    try:
        return memoryview(s).tobytes()
    except TypeError:
        raise TypeError("argument should be a bytes-like object or ASCII "
                        "string, not %r" % s.__class__.__name__) from None
\end{CodeNoBreak}
\begin{CodeCanBreak}{47}

您可以将此生成的 LaTeX 代码与\input适当的环境定义一起使用：

\documentclass{article}
\usepackage{minted}
\usepackage[a4paper, portrait, left=1.5cm, right=1.5cm, top=20mm, bottom=20mm]{geometry}
\usepackage{tcolorbox}
\tcbuselibrary{skins,minted,breakable}


\begin{document}



\tcbset{
  listingbase/.style n args={1}{
    enhanced,
    boxrule=0pt,
    top=0pt,
    bottom=0.4\baselineskip, % adjust accordingly
    left=0pt,
    right=0pt,
    colback=white,
    boxsep=0pt,
    nobeforeafter,
    before={\par\noindent},
    frame hidden,
    listing only, 
    listing engine=minted, 
    minted language=python,
    minted options={
        linenos,
        numbersep=0.5em,
        firstnumber={#1}
    }
  }
}

\newtcblisting{CodeCanBreak}[1]{listingbase=#1,breakable,}
\newtcblisting{CodeNoBreak}[1]{listingbase=#1}

\bgroup
\input{example.tex}
\egroup

\end{document}

Question 2

供参考，该包piton提供了一种使用 LuaLaTeX 排版 Python 列表的方法。

最新版本（2.7a 2024-03-30）有一个密钥split-on-empty-lines可用。当该密钥生效时，代码可能会被破解仅有的在空行上。

这是一个例子。

\begin{filecontents*}{myfile.txt}
def arctan(x,n=10):
   """Compute the mathematical value of arctan(x)"""
    if x < 0:
        return -arctan(-x) # recursive call
    elif x > 1: 
        return pi/2 - arctan(1/x) 
    else: 
        s = 0
        for k in range(n):
            s += (-1)**k/(2*k+1)*x**(2*k+1)
        return s 

def arctan(x,n=10):
   """Compute the mathematical value of arctan(x)"""
    if x < 0:
        return -arctan(-x) # recursive call
    elif x > 1: 
        return pi/2 - arctan(1/x) 
    else: 
        s = 0
        for k in range(n):
            s += (-1)**k/(2*k+1)*x**(2*k+1)
        return s 

def arctan(x,n=10):
   """Compute the mathematical value of arctan(x)"""
    if x < 0:
        return -arctan(-x) # recursive call
    elif x > 1: 
        return pi/2 - arctan(1/x) 
    else: 
        s = 0
        for k in range(n):
            s += (-1)**k/(2*k+1)*x**(2*k+1)
        return s 
\end{filecontents*}

\documentclass{article}
\usepackage{geometry}
\geometry{textheight=12cm}
\usepackage{piton}

\begin{document}

\PitonInputFile[split-on-empty-lines]{myfile.txt}

\end{document}

在该示例中，第二个和第三个函数之间的代码被破坏了。

也可以在其他地方允许分页。例如，splittable=3添加键时，Python 函数的每个定义都可以在前 3 行和后 3 行之外中断（这可以避免出现孤行）。

以下是上述示例中执行该指令时的输出

\PitonInputFile[split-on-empty-lines]{myfile.txt}

替换为：

\PitonInputFile[split-on-empty-lines,splittable=3]{myfile.txt}

Answer

供参考，该包piton提供了一种使用 LuaLaTeX 排版 Python 列表的方法。

最新版本（2.7a 2024-03-30）有一个密钥split-on-empty-lines可用。当该密钥生效时，代码可能会被破解仅有的在空行上。

这是一个例子。

\begin{filecontents*}{myfile.txt}
def arctan(x,n=10):
   """Compute the mathematical value of arctan(x)"""
    if x < 0:
        return -arctan(-x) # recursive call
    elif x > 1: 
        return pi/2 - arctan(1/x) 
    else: 
        s = 0
        for k in range(n):
            s += (-1)**k/(2*k+1)*x**(2*k+1)
        return s 

def arctan(x,n=10):
   """Compute the mathematical value of arctan(x)"""
    if x < 0:
        return -arctan(-x) # recursive call
    elif x > 1: 
        return pi/2 - arctan(1/x) 
    else: 
        s = 0
        for k in range(n):
            s += (-1)**k/(2*k+1)*x**(2*k+1)
        return s 

def arctan(x,n=10):
   """Compute the mathematical value of arctan(x)"""
    if x < 0:
        return -arctan(-x) # recursive call
    elif x > 1: 
        return pi/2 - arctan(1/x) 
    else: 
        s = 0
        for k in range(n):
            s += (-1)**k/(2*k+1)*x**(2*k+1)
        return s 
\end{filecontents*}

\documentclass{article}
\usepackage{geometry}
\geometry{textheight=12cm}
\usepackage{piton}

\begin{document}

\PitonInputFile[split-on-empty-lines]{myfile.txt}

\end{document}

在该示例中，第二个和第三个函数之间的代码被破坏了。

也可以在其他地方允许分页。例如，splittable=3添加键时，Python 函数的每个定义都可以在前 3 行和后 3 行之外中断（这可以避免出现孤行）。

以下是上述示例中执行该指令时的输出

\PitonInputFile[split-on-empty-lines]{myfile.txt}

替换为：

\PitonInputFile[split-on-empty-lines,splittable=3]{myfile.txt}

如何使用 Minted 避免代码块分裂到两个页面

答案1

手动断页以防止函数中出现分页符

LaTeX 代码

测试.py

自动方法

答案2

相关内容