Pandoc Markdown 表格的格式是否更紧密，以便 xelatex PDF 输出？

Question 1

我认为这不是在 markdown 文档中将 CSV 显示为表格的更简单方法。如果您使用的是 Quarto，您可以直接在主文档中使用 Python 来实现（我不知道下面的示例是否是最佳方法，我对这种语言的经验很少，抱歉）或使用 R 来实现（我只展示了该kable方法，但您也可以使用xtable其他方法），或者如果您只将其导出到 LateX，也可以使用 LaTeX 方法将 CSV 导入 LaTeX 表，而无需先前导出到 markdown（这里没有显示，但本网站在这方面有几个答案）。但是，您也可以简单地手动修复您获得的内容，以尽可能保持文档简单。无论如何，换行是正确的，避免了逐字文本。

请注意，由于某种原因，彩色背景比 booktabs 规则更宽（如您在自己的屏幕截图中所示）。我没有时间在 LateX 输出中查看原因，但恕我直言，最好避免使用带有 booktabs 规则的颜色，这也会添加垂直空格。这也会在垂直规则中造成严重的中断，无论如何您都应该始终避免这种情况，正如 @cfr 所评论的那样。

MWE用Quarto编译：

---
format: pdf
---

### With Python


```{python}
#| echo: false

import pandas as pd
df = pd.read_csv("foo.csv")

from IPython.display import display, Markdown

Markdown(df.to_markdown())

```

(Sorry, no idea of how remove row names in python) 


### With R 

```{r}
#| echo: false
#| output: asis

df  <- read.csv(file = "foo.csv",header=FALSE)
names(df) <- unname(unlist(df[1,]))
df <- df[-1,]

knitr::kable(df, row.names=FALSE,align="cllc")



```



### With Markdown (edited manually)

```{=tex}
\tabcolsep0pt
```

+---------------+--------------------------+----------------------------------+---------------+
| Item #/offset | Description/name         | Random info                      | Default value |
+:=============:+==========================+==================================+:=============:+
| 0             | Sum Difficult Copper     | Recover; sculpture               | 0             |
+---------------+--------------------------+----------------------------------+---------------+
| 1             | Joy Committee Tick       | Reference; ball average can loop | 0             |
+---------------+--------------------------+----------------------------------+---------------+

Answer

我认为这不是在 markdown 文档中将 CSV 显示为表格的更简单方法。如果您使用的是 Quarto，您可以直接在主文档中使用 Python 来实现（我不知道下面的示例是否是最佳方法，我对这种语言的经验很少，抱歉）或使用 R 来实现（我只展示了该kable方法，但您也可以使用xtable其他方法），或者如果您只将其导出到 LateX，也可以使用 LaTeX 方法将 CSV 导入 LaTeX 表，而无需先前导出到 markdown（这里没有显示，但本网站在这方面有几个答案）。但是，您也可以简单地手动修复您获得的内容，以尽可能保持文档简单。无论如何，换行是正确的，避免了逐字文本。

请注意，由于某种原因，彩色背景比 booktabs 规则更宽（如您在自己的屏幕截图中所示）。我没有时间在 LateX 输出中查看原因，但恕我直言，最好避免使用带有 booktabs 规则的颜色，这也会添加垂直空格。这也会在垂直规则中造成严重的中断，无论如何您都应该始终避免这种情况，正如 @cfr 所评论的那样。

MWE用Quarto编译：

---
format: pdf
---

### With Python


```{python}
#| echo: false

import pandas as pd
df = pd.read_csv("foo.csv")

from IPython.display import display, Markdown

Markdown(df.to_markdown())

```

(Sorry, no idea of how remove row names in python) 


### With R 

```{r}
#| echo: false
#| output: asis

df  <- read.csv(file = "foo.csv",header=FALSE)
names(df) <- unname(unlist(df[1,]))
df <- df[-1,]

knitr::kable(df, row.names=FALSE,align="cllc")



```



### With Markdown (edited manually)

```{=tex}
\tabcolsep0pt
```

+---------------+--------------------------+----------------------------------+---------------+
| Item #/offset | Description/name         | Random info                      | Default value |
+:=============:+==========================+==================================+:=============:+
| 0             | Sum Difficult Copper     | Recover; sculpture               | 0             |
+---------------+--------------------------+----------------------------------+---------------+
| 1             | Joy Committee Tick       | Reference; ball average can loop | 0             |
+---------------+--------------------------+----------------------------------+---------------+

Question 2

好吧 - 我找到了一种基于两项改变的解决方案，其中没有一项与 Latex 直接相关：

在.csv, 中使用换行符强制列名称断开

查看 OP，我们可以看到列标签“项目编号/偏移量”和“默认值”比它们的典型内容长得多 -> 这最终导致网格表 Markdown 格式 -> 最终决定 Latex/PDF 列的相对宽度。

因此，在中插入如下换行符.csv：

"Item #
/offset",Description/name,Random info,"Default
value"
0,Sum Difficult Copper   ,Recover; sculpture,0
1,Joy Committee Tick     ,Reference; ball average can loop,0

... 结果如下：

+-----------+--------------------------+----------------------------------+-----------+
| Item #    | Description/name         | Random info                      | Default   |
| /offset   |                          |                                  | value     |
+===========+==========================+==================================+===========+
| `0`       | "`Sum Difficult Copper`" | Recover; sculpture               | 0         |
+-----------+--------------------------+----------------------------------+-----------+
| `1`       | "`Joy Committee Tick`"   | Reference; ball average can loop | 0         |
+-----------+--------------------------+----------------------------------+-----------+

在 Python 中的调用中使用colalignandnumalign.to_markdown

我注意到pandas.DataFrame.to_markdown — pandas 2.1.1 文档：

**kwargs：这些参数将传递给制表。

... 并且 tabulate 具有colalign和numalign参数。它会自动检测数字，并且数字的默认样式是“正确” - 但在这种情况下，数字往往会“粘”在底部。

numalign="left", colalign=("left",)因此，对我来说，解决方案是在调用中使用.to_markdown；这会使最后一列的数字左对齐（上表也显示了这种效果） - 最后，得到了更紧密的自动表格布局：

这是修改后的版本table_to_md.py（如果在 Windows 上，请确保使用 Unix 行尾保存它，\n例如在 Notepad++ 中，否则脚本做出的一些假设将会被破坏）：

#!/usr/bin/env python3

import os
import pandas as pd
import subprocess

PANDOC = "pandoc"
TABLE_FNAME = "table.csv"
TABLE_DATA="""\
"Item #
/offset",Description/name,Random info,"Default
value"
0,Sum Difficult Copper   ,Recover; sculpture,0
1,Joy Committee Tick     ,Reference; ball average can loop,0
"""
YAML_FNAME = "pandoc_style.yaml"
YAML_DATA = r"""
pdf-engine: xelatex
filters:
- pandoc-crossref
- citeproc

metadata:
  link-citations: true # works fine here!

listings: true

variables:
  geometry: margin=2cm
  classoption: table
  documentclass: extarticle
  numbersections: true
  papersize: a4
  fontsize: 12pt
  unicode-math: bold-style=ISO
  listings: true
  header-includes: |
    \rowcolors{2}{gray!10}{gray!25}
    \usepackage{fontspec}
    \setmainfont[Ligatures=TeX]{CMU Serif}
    \unimathsetup{bold-style=ISO}
    \lstset{% for listings
      basicstyle=\ttfamily,
      breaklines=true,
      postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space},
    }
"""


CONTENT_MD_FNAME="my_table_description.md"
CONTENT_TEX_FNAME="my_table_description.tex"
CONTENT_PDF_FNAME="my_table_description.pdf"
CONTENT_MD_DATA="""\
---
title: "My table description document"
author: John Doe

---

Here is the table, discussed in this document:

{}

Notice the table formatting!
"""


with open(TABLE_FNAME, 'wb') as f:
  f.write(bytes(TABLE_DATA, encoding='utf-8'))
with open(YAML_FNAME, 'w', encoding='utf-8') as f:
  f.write(YAML_DATA)

df_table = pd.read_csv(TABLE_FNAME)

print(df_table.columns)

# make these columns "monospace"/"code" with backticks
df_table["Item #\n/offset"] = df_table["Item #\n/offset"].apply(lambda x: "`{}`".format(x))
df_table["Description/name"] = df_table["Description/name"].apply(lambda x: '"`{}`"'.format(x.strip()))

table_md = df_table.to_markdown(tablefmt="grid", index=False, numalign="left", colalign=("left",))
content_md = CONTENT_MD_DATA.format(table_md)
with open(CONTENT_MD_FNAME, 'w', encoding='utf-8') as f:
  f.write(content_md)

#print(content_md)
print("Saved {}".format(CONTENT_MD_FNAME))

#pandoc -s MANUAL.txt -o example4.tex
DEFAULTS_ARG = "--defaults={}".format(YAML_FNAME)
PANDOC_CMD1 = [PANDOC, DEFAULTS_ARG, "--toc", "-s", CONTENT_MD_FNAME, "-o", CONTENT_TEX_FNAME]
print("Running: {}".format(" ".join(PANDOC_CMD1)))
result = subprocess.run(PANDOC_CMD1, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, cwd=os.getcwd())
print(result.stderr)
if (result.returncode != 0):
  print("  Got err '{}' ...".format(result.stderr))

PANDOC_CMD2 = [PANDOC, DEFAULTS_ARG, "--toc", "--verbose", CONTENT_MD_FNAME, "-o", CONTENT_PDF_FNAME]
print("Running: {}".format(" ".join(PANDOC_CMD2)))
result = subprocess.run(PANDOC_CMD2, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, cwd=os.getcwd(), encoding='UTF-8')
prn_nextline = False
for line in result.stderr.splitlines():
  if (("written" in line) or ("makePDF" in line) or (prn_nextline)):
    print(line)
    if (prn_nextline):
      prn_nextline = False
    if (("temp dir:" in line) or ("Command line:" in line)):
      prn_nextline = True
if (result.returncode != 0):
  print("  Got err '{}' ...".format(result.stderr))

Answer