正则表达式的最小数据
\documentclass{article}
\begin{document}
-------------------------------------------------------------
A B C D
Header Aligned Aligned Aligned
----------- ------- --------------- -------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
-------------------------------------------------------------
Table: Here's the caption. It, too, may span
multiple lines.
\section{Lorem Ipsun}
Hello world!
-------------------------------------------------------------
A B C D
Header Aligned Aligned Aligned
----------- ------- --------------- -------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
-------------------------------------------------------------
Table: Here's the caption. It, too, may span
multiple lines.
\end{document}
所需输出
-------------------------------------------------------------
A B C D
Header Aligned Aligned Aligned
----------- ------- --------------- -------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
-------------------------------------------------------------
Table: Here's the caption. It, too, may span
multiple lines.
-------------------------------------------------------------
A B C D
Header Aligned Aligned Aligned
----------- ------- --------------- -------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
-------------------------------------------------------------
Table: Here's the caption. It, too, may span
multiple lines.
我想从 LaTeX 文档中提取所有表格。
伪代码
- 连续匹配 >7 个“-”,直到“Table:”为止的所有内容。包含带有“Table:”的行,但不包含该行之后的任何内容。
- 迭代1)直到文件末尾
我的尝试
第一步
[-]{10,777}$
现在包括除“Table:”一词之外的所有内容
((?!Table:).)*$
并最终包括与“表:”一致的所有内容
^(?=.*?\Table:\b)
全部合并
[-]{10,777}$((?!Table:).)*$^(?=.*?\Table:\b)
这是行不通的。有什么不对劲,但我不知道是什么。
如何在 Perl 中很好地正则表达式这样的环境?
答案1
如果您更新问题,我会编辑此内容,但我思考你正在寻找这样的东西:
perl -007lne '@F=(/-{7,}.*?Table:.*?\n(?=\n)/gsm); print join "\n", @F' file.tex
解释
-007
: 吞掉整个文件-lne
:l
为每个print
调用添加一个新的ine,处理输入文件,并运行 给出的脚本-e
。@F=(/pattern/gsm)
:保存pattern
数组中的所有匹配项@F
。打开g
全局匹配,s
使.
匹配换行符,并使m
匹配运算符跨多行匹配。-{7,}.*?Table:.*?\n(?=\n)
:匹配 7 个或更多-
,然后直到第一个Table:
(.*?Table:)
,然后直到前两个连续换行符 (.*?\n(?=\n)
) 为止的任何内容。我使用前瞻只是为了避免打印两个换行符。print join "\n", @F
:打印数组的每个元素@F
,并用换行符分隔它们。
答案2
sed -n '/-\{10,777\}/,/^\s*Table:/p' LaTeX.doc
如果您想在每个表后换行:
sed -n '/^\s*Table:/G;/-\{10,777\}/,/^\s*Table:/p' LaTeX.doc
或者
sed '/-\{10,777\}/,/^\s*Table:/! d;/^\s*Table:/G' LaTeX.doc