以下包含 html 表格标签的 markdown 代码在使用 Pandoc 转换为 latex 格式时无法正确呈现。
文件.md:
<table>
<tr>
<th>Alfreds Futterkiste</th>
<th>Maria Anders</th>
<th>Germany</th>
</tr>
<tr>
<td>Centro comercial Moctezuma</td>
<td>Francisco Chang</td>
<td>Mexico</td>
</tr>
</table>
| Alfreds Futterkiste | Maria Anders | Germany |
|---------------------|--------------|---------|
| Centro comercial Moctezuma | Francisco Chang | Mexico |
pandoc 文件.md -s -t latex
结果是(输出被剪切到相关部分):
Alfreds Futterkiste
Maria Anders
Germany
Centro comercial Moctezuma
Francisco Chang
Mexico
\begin{longtable}[]{@{}lll@{}}
\toprule
Alfreds Futterkiste & Maria Anders & Germany \\
\midrule
\endhead
Centro comercial Moctezuma & Francisco Chang & Mexico \\
\bottomrule
\end{longtable}
在 pandoc 中添加 --verbose 选项表明它忽略了 html 标签
[INFO] Not rendering RawBlock (Format "html") "<table>"
[INFO] Not rendering RawBlock (Format "html") "<tr>"
[INFO] Not rendering RawBlock (Format "html") "<td>"
[INFO] Not rendering RawBlock (Format "html") "</td>"
[INFO] Not rendering RawBlock (Format "html") "<td>"
[INFO] Not rendering RawBlock (Format "html") "</td>"
[INFO] Not rendering RawBlock (Format "html") "<td>"
[INFO] Not rendering RawBlock (Format "html") "</td>"
[INFO] Not rendering RawBlock (Format "html") "</tr>"
[INFO] Not rendering RawBlock (Format "html") "<tr>"
[INFO] Not rendering RawBlock (Format "html") "<td>"
[INFO] Not rendering RawBlock (Format "html") "</td>"
[INFO] Not rendering RawBlock (Format "html") "<td>"
[INFO] Not rendering RawBlock (Format "html") "</td>"
[INFO] Not rendering RawBlock (Format "html") "<td>"
[INFO] Not rendering RawBlock (Format "html") "</td>"
[INFO] Not rendering RawBlock (Format "html") "</tr>"
[INFO] Not rendering RawBlock (Format "html") "</table>"
我怎样才能让它像管道表一样将它们处理为 markdown 中的 html 表?
我不想使用管道表,因为技术作家很难编辑/使用它们。
答案1
Pandoc 的默认行为是保留原始 HTML 内容。你可以强制解析它,例如使用Lua 过滤器. 将以下代码放入文件中parse-html.lua
:
function RawBlock (raw)
return raw.format:match 'html'
and pandoc.read(raw.text, 'html').blocks
or raw
end
然后使用以下代码调用 pandoc
pandoc --lua-filter=parse-html.lua --from=markdown-markdown_in_html_blocks ...
您的表格现在应该显示为正确的 LaTeX 表格。
答案2
从 tarleb 的回答开始,您可能能够使用过滤器处理 html 块中的 markdown:
function RawBlock (raw)
if raw.format:match 'html' then
blocks = pandoc.read(raw.text, 'html').blocks
for i = 1, #blocks do
blocks[i] = pandoc.walk_block(blocks[i],
{
SoftBreak = function(el)
return pandoc.Str("\n")
end,
Plain = function(el)
return pandoc.read(pandoc.utils.stringify(el), 'markdown').blocks
end
}
)
end
return blocks
end
return raw
end
同样的方法,markdown_in_html_blocks
需要禁用:
pandoc --lua-filter=parse-html.lua --from=markdown-markdown_in_html_blocks ...