到正则表达式 Pandoc 表

到正则表达式 Pandoc 表

正则表达式的最小数据

\documentclass{article}
\begin{document}

-------------------------------------------------------------
A           B       C               D
Header      Aligned Aligned         Aligned
----------- ------- --------------- -------------------------
First       row     12.0            Example of a row that
                                    spans multiple lines.

Second      row     5.0             Here's another one. Note
                                    the blank line between
                                    rows.
-------------------------------------------------------------

Table: Here's the caption. It, too, may span
multiple lines.

\section{Lorem Ipsun}
Hello world!

-------------------------------------------------------------
A           B       C               D
Header      Aligned Aligned         Aligned
----------- ------- --------------- -------------------------
First       row     12.0            Example of a row that
                                    spans multiple lines.

Second      row     5.0             Here's another one. Note
                                    the blank line between
                                    rows.
-------------------------------------------------------------

Table: Here's the caption. It, too, may span
multiple lines.

\end{document}

所需输出

-------------------------------------------------------------
A           B       C               D
Header      Aligned Aligned         Aligned
----------- ------- --------------- -------------------------
First       row     12.0            Example of a row that
                                    spans multiple lines.

Second      row     5.0             Here's another one. Note
                                    the blank line between
                                    rows.
-------------------------------------------------------------

Table: Here's the caption. It, too, may span
multiple lines.

-------------------------------------------------------------
A           B       C               D
Header      Aligned Aligned         Aligned
----------- ------- --------------- -------------------------
First       row     12.0            Example of a row that
                                    spans multiple lines.

Second      row     5.0             Here's another one. Note
                                    the blank line between
                                    rows.
-------------------------------------------------------------

Table: Here's the caption. It, too, may span
multiple lines.

我想从 LaTeX 文档中提取所有表格。

伪代码

  1. 连续匹配 >7 个“-”,直到“Table:”为止的所有内容。包含带有“Table:”的行,但不包含该行之后的任何内容。
  2. 迭代1)直到文件末尾

我的尝试

第一步

[-]{10,777}$

现在包括除“Table:”一词之外的所有内容

((?!Table:).)*$

并最终包括与“表:”一致的所有内容

^(?=.*?\Table:\b)

全部合并

[-]{10,777}$((?!Table:).)*$^(?=.*?\Table:\b)

这是行不通的。有什么不对劲,但我不知道是什么。

如何在 Perl 中很好地正则表达式这样的环境?

答案1

如果您更新问题,我会编辑此内容,但我思考你正在寻找这样的东西:

perl -007lne '@F=(/-{7,}.*?Table:.*?\n(?=\n)/gsm); print join "\n", @F' file.tex 

解释

  • -007: 吞掉整个文件
  • -lnel为每个print调用添加一个新的ine,处理输入文件,并运行 给出的脚本-e
  • @F=(/pattern/gsm):保存pattern数组中的所有匹配项@F。打开g全局匹配,s使.匹配换行符,并使m匹配运算符跨多行匹配。
  • -{7,}.*?Table:.*?\n(?=\n):匹配 7 个或更多-,然后直到第一个Table:( .*?Table:),然后直到前两个连续换行符 ( .*?\n(?=\n)) 为止的任何内容。我使用前瞻只是为了避免打印两个换行符。
  • print join "\n", @F:打印数组的每个元素@F,并用换行符分隔它们。

答案2

sed -n '/-\{10,777\}/,/^\s*Table:/p' LaTeX.doc

如果您想在每个表后换行:

sed -n '/^\s*Table:/G;/-\{10,777\}/,/^\s*Table:/p' LaTeX.doc

或者

sed '/-\{10,777\}/,/^\s*Table:/! d;/^\s*Table:/G' LaTeX.doc

相关内容