Pandoc Markdown 表格的格式是否更紧密,以便 xelatex PDF 输出?

Pandoc Markdown 表格的格式是否更紧密,以便 xelatex PDF 输出?

我有一个管道,我从一个文件开始.csv(完整的 MWE 作为下面的 Python 脚本):

Item #/offset,Description/name,Random info,Default value
0,Sum Difficult Copper   ,Recover; sculpture,0
1,Joy Committee Tick     ,Reference; ball average can loop,0

...然后我使用 Python 将其转换为 Markdown“网格”表pandas,选择用反引号“格式化”某些列,这样我就得到了等宽字体 - 并将其插入到 Markdown 文档中:

---
title: "My table description document"
author: John Doe

---

Here is the table, discussed in this document:

+-----------------+--------------------------+----------------------------------+-----------------+
| Item #/offset   | Description/name         | Random info                      |   Default value |
+=================+==========================+==================================+=================+
| `0`             | "`Sum Difficult Copper`" | Recover; sculpture               |               0 |
+-----------------+--------------------------+----------------------------------+-----------------+
| `1`             | "`Joy Committee Tick`"   | Reference; ball average can loop |               0 |
+-----------------+--------------------------+----------------------------------+-----------------+

Notice the table formatting!

...最后,pandoc通过 xelatex 将此 Markdown 转换为 pdf - 产生以下输出:

pdf_输出

首先,这个表格内容绝对可以放在每行一行中;我不知道为什么“Sum Difficult Copper”会损坏/自动换行?此外,用反引号包裹的数字会粘在单元格的“顶部”并左对齐,而普通数字会粘在“底部”并右对齐?

verbatim红色换行符表明包装肯定有影响- 但是,“参考;球平均值可以循环”也是断线的,它是正常文本吗?

我的问题是:

  • 在 Latex 级别上是否存在一些简单的解决方案(例如,包括一些包),它将“优先”适合单元格内容,以便一行适合一行 - 无论我是否在表格内使用反引号表示等宽字体?
    • (附带问题,我知道这不是适合讨论这个问题的论坛:有没有办法控制单元格的左/右对齐,甚至是上/下对齐,也许已经在 Python 中.to_markdown调用了?)
  • 在 Latex 级别上是否存在一些简单的解决方案(例如,包括一些包),以便我获得表格单元格边框(“外边框和所有内线”,在 LibreOffice calc 中称为)?

以下是 Python 脚本table_to_md.py

#!/usr/bin/env python3

import os
import pandas as pd
import subprocess

PANDOC = "pandoc"
TABLE_FNAME = "table.csv"
TABLE_DATA="""\
Item #/offset,Description/name,Random info,Default value
0,Sum Difficult Copper   ,Recover; sculpture,0
1,Joy Committee Tick     ,Reference; ball average can loop,0
"""
YAML_FNAME = "pandoc_style.yaml"
YAML_DATA = r"""
pdf-engine: xelatex
filters:
- pandoc-crossref
- citeproc

metadata:
  link-citations: true # works fine here!

listings: true

variables:
  geometry: margin=2cm
  classoption: table
  documentclass: extarticle
  numbersections: true
  papersize: a4
  fontsize: 12pt
  unicode-math: bold-style=ISO
  listings: true
  header-includes: |
    \rowcolors{2}{gray!10}{gray!25}
    \usepackage{fontspec}
    \setmainfont[Ligatures=TeX]{CMU Serif}
    \unimathsetup{bold-style=ISO}
    \lstset{% for listings
      basicstyle=\ttfamily,
      breaklines=true,
      postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space},
    }
"""


CONTENT_MD_FNAME="my_table_description.md"
CONTENT_TEX_FNAME="my_table_description.tex"
CONTENT_PDF_FNAME="my_table_description.pdf"
CONTENT_MD_DATA="""\
---
title: "My table description document"
author: John Doe

---

Here is the table, discussed in this document:

{}

Notice the table formatting!
"""


with open(TABLE_FNAME, 'w', encoding='utf-8') as f:
  f.write(TABLE_DATA)
with open(YAML_FNAME, 'w', encoding='utf-8') as f:
  f.write(YAML_DATA)

df_table = pd.read_csv(TABLE_FNAME)

# make these columns "monospace"/"code" with backticks
df_table["Item #/offset"] = df_table["Item #/offset"].apply(lambda x: "`{}`".format(x))
df_table["Description/name"] = df_table["Description/name"].apply(lambda x: '"`{}`"'.format(x.strip()))

table_md = df_table.to_markdown(tablefmt="grid", index=False)
content_md = CONTENT_MD_DATA.format(table_md)
with open(CONTENT_MD_FNAME, 'w', encoding='utf-8') as f:
  f.write(content_md)

#print(content_md)
print("Saved {}".format(CONTENT_MD_FNAME))

#pandoc -s MANUAL.txt -o example4.tex
DEFAULTS_ARG = "--defaults={}".format(YAML_FNAME)
PANDOC_CMD1 = [PANDOC, DEFAULTS_ARG, "--toc", "-s", CONTENT_MD_FNAME, "-o", CONTENT_TEX_FNAME]
print("Running: {}".format(" ".join(PANDOC_CMD1)))
result = subprocess.run(PANDOC_CMD1, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, cwd=os.getcwd())
print(result.stderr)
if (result.returncode != 0):
  print("  Got err '{}' ...".format(result.stderr))

PANDOC_CMD2 = [PANDOC, DEFAULTS_ARG, "--toc", "--verbose", CONTENT_MD_FNAME, "-o", CONTENT_PDF_FNAME]
print("Running: {}".format(" ".join(PANDOC_CMD2)))
result = subprocess.run(PANDOC_CMD2, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, cwd=os.getcwd(), encoding='UTF-8')
prn_nextline = False
for line in result.stderr.splitlines():
  if (("written" in line) or ("makePDF" in line) or (prn_nextline)):
    print(line)
    if (prn_nextline):
      prn_nextline = False
    if (("temp dir:" in line) or ("Command line:" in line)):
      prn_nextline = True
if (result.returncode != 0):
  print("  Got err '{}' ...".format(result.stderr))

当我运行它时,我得到以下输出:

$ python3 table_to_md.py
Saved my_table_description.md
Running: pandoc --defaults=pandoc_style.yaml --toc -s my_table_description.md -o my_table_description.tex

Running: pandoc --defaults=pandoc_style.yaml --toc --verbose my_table_description.md -o my_table_description.pdf
[makePDF] temp dir:
C:/msys64/tmp/tex2pdf.-32afd74f419dcabc
[makePDF] Command line:
xelatex "-halt-on-error" "-interaction" "nonstopmode" "-output-directory" "D:/msys64/tmp/tex2pdf.-32afd74f419dcabc" "C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.tex"
[makePDF] Environment:
[makePDF] Source:
[makePDF] Run #1
Output written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.pdf (1 page).
Transcript written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.log.
[makePDF] Run #2
Output written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.pdf (1 page).
Transcript written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.log.
[makePDF] Run #3
Output written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.pdf (1 page).
Transcript written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.log.

...为了完整起见,这里是生成的my_table_description.tex

% Options for packages loaded elsewhere
\PassOptionsToPackage{unicode}{hyperref}
\PassOptionsToPackage{hyphens}{url}
%
\documentclass[
  12pt,
  a4paper,
  table]{extarticle}
\usepackage{amsmath,amssymb}
\usepackage{lmodern}
\usepackage{iftex}
\ifPDFTeX
  \usepackage[T1]{fontenc}
  \usepackage[utf8]{inputenc}
  \usepackage{textcomp} % provide euro and other symbols
\else % if luatex or xetex
  \usepackage{unicode-math}
  \defaultfontfeatures{Scale=MatchLowercase}
  \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
\fi
% Use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\IfFileExists{microtype.sty}{% use microtype if available
  \usepackage[]{microtype}
  \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
  \IfFileExists{parskip.sty}{%
    \usepackage{parskip}
  }{% else
    \setlength{\parindent}{0pt}
    \setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
  \KOMAoptions{parskip=half}}
\makeatother
\usepackage{xcolor}
\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}}
\hypersetup{
  pdftitle={My table description document},
  pdfauthor={John Doe},
  hidelinks,
  pdfcreator={LaTeX via pandoc}}
\urlstyle{same} % disable monospaced font for URLs
\usepackage[margin=2cm]{geometry}
\usepackage{listings}
\newcommand{\passthrough}[1]{#1}
\lstset{defaultdialect=[5.3]Lua}
\lstset{defaultdialect=[x86masm]Assembler}
\usepackage{longtable,booktabs,array}
\usepackage{calc} % for calculating minipage widths
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
  \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{5}
\rowcolors{2}{gray!10}{gray!25}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{CMU Serif}
\unimathsetup{bold-style=ISO}
\lstset{% for listings
  basicstyle=\ttfamily,
  breaklines=true,
  postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space},
}
\ifLuaTeX
  \usepackage{selnolig}  % disable illegal ligatures
\fi

\title{My table description document}
\author{John Doe}
\date{}

\begin{document}
\maketitle

{
\setcounter{tocdepth}{3}
\tableofcontents
}
Here is the table, discussed in this document:

\begin{longtable}[]{@{}
  >{\raggedright\arraybackslash}p{(\columnwidth - 6\tabcolsep) * \real{0.18}}
  >{\raggedright\arraybackslash}p{(\columnwidth - 6\tabcolsep) * \real{0.28}}
  >{\raggedright\arraybackslash}p{(\columnwidth - 6\tabcolsep) * \real{0.36}}
  >{\raggedright\arraybackslash}p{(\columnwidth - 6\tabcolsep) * \real{0.18}}@{}}
\toprule
Item \#/offset & Description/name & Random info & Default value \\
\midrule
\endhead
\passthrough{\lstinline!0!} &
``\passthrough{\lstinline!Sum Difficult Copper!}'' & Recover; sculpture
& \begin{minipage}[t]{\linewidth}\raggedright
\begin{lstlisting}
          0
\end{lstlisting}
\end{minipage} \\
\passthrough{\lstinline!1!} &
``\passthrough{\lstinline!Joy Committee Tick!}'' & Reference; ball
average can loop & \begin{minipage}[t]{\linewidth}\raggedright
\begin{lstlisting}
          0
\end{lstlisting}
\end{minipage} \\
\bottomrule
\end{longtable}

Notice the table formatting!

\end{document}

答案1

我认为这不是在 markdown 文档中将 CSV 显示为表格的更简单方法。如果您使用的是 Quarto,您可以直接在主文档中使用 Python 来实现(我不知道下面的示例是否是最佳方法,我对这种语言的经验很少,抱歉)或使用 R 来实现(我只展示了该kable方法,但您也可以使用xtable其他方法),或者如果您只将其导出到 LateX,也可以使用 LaTeX 方法将 CSV 导入 LaTeX 表,而无需先前导出到 markdown(这里没有显示,但本网站在这方面有几个答案)。但是,您也可以简单地手动修复您获得的内容,以尽可能保持文档简单。无论如何,换行是正确的,避免了逐字文本。

请注意,由于某种原因,彩色背景比 booktabs 规则更宽(如您在自己的屏幕截图中所示)。我没有时间在 LateX 输出中查看原因,但恕我直言,最好避免使用带有 booktabs 规则的颜色,这也会添加垂直空格。这也会在垂直规则中造成严重的中断,无论如何您都应该始终避免这种情况,正如 @cfr 所评论的那样。

MWE用Quarto编译:

---
format: pdf
---

### With Python


```{python}
#| echo: false

import pandas as pd
df = pd.read_csv("foo.csv")

from IPython.display import display, Markdown

Markdown(df.to_markdown())

```

(Sorry, no idea of how remove row names in python) 


### With R 

```{r}
#| echo: false
#| output: asis

df  <- read.csv(file = "foo.csv",header=FALSE)
names(df) <- unname(unlist(df[1,]))
df <- df[-1,]

knitr::kable(df, row.names=FALSE,align="cllc")



```



### With Markdown (edited manually)

```{=tex}
\tabcolsep0pt
```

+---------------+--------------------------+----------------------------------+---------------+
| Item #/offset | Description/name         | Random info                      | Default value |
+:=============:+==========================+==================================+:=============:+
| 0             | Sum Difficult Copper     | Recover; sculpture               | 0             |
+---------------+--------------------------+----------------------------------+---------------+
| 1             | Joy Committee Tick       | Reference; ball average can loop | 0             |
+---------------+--------------------------+----------------------------------+---------------+

平均能量损失

答案2

好吧 - 我找到了一种基于两项改变的解决方案,其中没有一项与 Latex 直接相关:

  1. .csv, 中使用换行符强制列名称断开

查看 OP,我们可以看到列标签“项目编号/偏移量”和“默认值”比它们的典型内容长得多 -> 这最终导致网格表 Markdown 格式 -> 最终决定 Latex/PDF 列的相对宽度。

因此,在中插入如下换行符.csv

"Item #
/offset",Description/name,Random info,"Default
value"
0,Sum Difficult Copper   ,Recover; sculpture,0
1,Joy Committee Tick     ,Reference; ball average can loop,0

... 结果如下:

+-----------+--------------------------+----------------------------------+-----------+
| Item #    | Description/name         | Random info                      | Default   |
| /offset   |                          |                                  | value     |
+===========+==========================+==================================+===========+
| `0`       | "`Sum Difficult Copper`" | Recover; sculpture               | 0         |
+-----------+--------------------------+----------------------------------+-----------+
| `1`       | "`Joy Committee Tick`"   | Reference; ball average can loop | 0         |
+-----------+--------------------------+----------------------------------+-----------+
  1. 在 Python 中的调用中使用colalignandnumalign.to_markdown

我注意到pandas.DataFrame.to_markdown — pandas 2.1.1 文档

**kwargs:这些参数将传递给制表

... 并且 tabulate 具有colalignnumalign参数。它会自动检测数字,并且数字的默认样式是“正确” - 但在这种情况下,数字往往会“粘”在底部。

numalign="left", colalign=("left",)因此,对我来说,解决方案是在调用中使用.to_markdown;这会使最后一列的数字左对齐(上表也显示了这种效果) - 最后,得到了更紧密的自动表格布局:

输出_pdf

这是修改后的版本table_to_md.py(如果在 Windows 上,请确保使用 Unix 行尾保存它,\n例如在 Notepad++ 中,否则脚本做出的一些假设将会被破坏):

#!/usr/bin/env python3

import os
import pandas as pd
import subprocess

PANDOC = "pandoc"
TABLE_FNAME = "table.csv"
TABLE_DATA="""\
"Item #
/offset",Description/name,Random info,"Default
value"
0,Sum Difficult Copper   ,Recover; sculpture,0
1,Joy Committee Tick     ,Reference; ball average can loop,0
"""
YAML_FNAME = "pandoc_style.yaml"
YAML_DATA = r"""
pdf-engine: xelatex
filters:
- pandoc-crossref
- citeproc

metadata:
  link-citations: true # works fine here!

listings: true

variables:
  geometry: margin=2cm
  classoption: table
  documentclass: extarticle
  numbersections: true
  papersize: a4
  fontsize: 12pt
  unicode-math: bold-style=ISO
  listings: true
  header-includes: |
    \rowcolors{2}{gray!10}{gray!25}
    \usepackage{fontspec}
    \setmainfont[Ligatures=TeX]{CMU Serif}
    \unimathsetup{bold-style=ISO}
    \lstset{% for listings
      basicstyle=\ttfamily,
      breaklines=true,
      postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space},
    }
"""


CONTENT_MD_FNAME="my_table_description.md"
CONTENT_TEX_FNAME="my_table_description.tex"
CONTENT_PDF_FNAME="my_table_description.pdf"
CONTENT_MD_DATA="""\
---
title: "My table description document"
author: John Doe

---

Here is the table, discussed in this document:

{}

Notice the table formatting!
"""


with open(TABLE_FNAME, 'wb') as f:
  f.write(bytes(TABLE_DATA, encoding='utf-8'))
with open(YAML_FNAME, 'w', encoding='utf-8') as f:
  f.write(YAML_DATA)

df_table = pd.read_csv(TABLE_FNAME)

print(df_table.columns)

# make these columns "monospace"/"code" with backticks
df_table["Item #\n/offset"] = df_table["Item #\n/offset"].apply(lambda x: "`{}`".format(x))
df_table["Description/name"] = df_table["Description/name"].apply(lambda x: '"`{}`"'.format(x.strip()))

table_md = df_table.to_markdown(tablefmt="grid", index=False, numalign="left", colalign=("left",))
content_md = CONTENT_MD_DATA.format(table_md)
with open(CONTENT_MD_FNAME, 'w', encoding='utf-8') as f:
  f.write(content_md)

#print(content_md)
print("Saved {}".format(CONTENT_MD_FNAME))

#pandoc -s MANUAL.txt -o example4.tex
DEFAULTS_ARG = "--defaults={}".format(YAML_FNAME)
PANDOC_CMD1 = [PANDOC, DEFAULTS_ARG, "--toc", "-s", CONTENT_MD_FNAME, "-o", CONTENT_TEX_FNAME]
print("Running: {}".format(" ".join(PANDOC_CMD1)))
result = subprocess.run(PANDOC_CMD1, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, cwd=os.getcwd())
print(result.stderr)
if (result.returncode != 0):
  print("  Got err '{}' ...".format(result.stderr))

PANDOC_CMD2 = [PANDOC, DEFAULTS_ARG, "--toc", "--verbose", CONTENT_MD_FNAME, "-o", CONTENT_PDF_FNAME]
print("Running: {}".format(" ".join(PANDOC_CMD2)))
result = subprocess.run(PANDOC_CMD2, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, cwd=os.getcwd(), encoding='UTF-8')
prn_nextline = False
for line in result.stderr.splitlines():
  if (("written" in line) or ("makePDF" in line) or (prn_nextline)):
    print(line)
    if (prn_nextline):
      prn_nextline = False
    if (("temp dir:" in line) or ("Command line:" in line)):
      prn_nextline = True
if (result.returncode != 0):
  print("  Got err '{}' ...".format(result.stderr))

相关内容