我有一个管道,我从一个文件开始.csv
(完整的 MWE 作为下面的 Python 脚本):
Item #/offset,Description/name,Random info,Default value
0,Sum Difficult Copper ,Recover; sculpture,0
1,Joy Committee Tick ,Reference; ball average can loop,0
...然后我使用 Python 将其转换为 Markdown“网格”表pandas
,选择用反引号“格式化”某些列,这样我就得到了等宽字体 - 并将其插入到 Markdown 文档中:
---
title: "My table description document"
author: John Doe
---
Here is the table, discussed in this document:
+-----------------+--------------------------+----------------------------------+-----------------+
| Item #/offset | Description/name | Random info | Default value |
+=================+==========================+==================================+=================+
| `0` | "`Sum Difficult Copper`" | Recover; sculpture | 0 |
+-----------------+--------------------------+----------------------------------+-----------------+
| `1` | "`Joy Committee Tick`" | Reference; ball average can loop | 0 |
+-----------------+--------------------------+----------------------------------+-----------------+
Notice the table formatting!
...最后,pandoc
通过 xelatex 将此 Markdown 转换为 pdf - 产生以下输出:
首先,这个表格内容绝对可以放在每行一行中;我不知道为什么“Sum Difficult Copper”会损坏/自动换行?此外,用反引号包裹的数字会粘在单元格的“顶部”并左对齐,而普通数字会粘在“底部”并右对齐?
verbatim
红色换行符表明包装肯定有影响- 但是,“参考;球平均值可以循环”也是断线的,它是正常文本吗?
我的问题是:
- 在 Latex 级别上是否存在一些简单的解决方案(例如,包括一些包),它将“优先”适合单元格内容,以便一行适合一行 - 无论我是否在表格内使用反引号表示等宽字体?
- (附带问题,我知道这不是适合讨论这个问题的论坛:有没有办法控制单元格的左/右对齐,甚至是上/下对齐,也许已经在 Python 中
.to_markdown
调用了?)
- (附带问题,我知道这不是适合讨论这个问题的论坛:有没有办法控制单元格的左/右对齐,甚至是上/下对齐,也许已经在 Python 中
- 在 Latex 级别上是否存在一些简单的解决方案(例如,包括一些包),以便我获得表格单元格边框(“外边框和所有内线”,在 LibreOffice calc 中称为)?
以下是 Python 脚本table_to_md.py
:
#!/usr/bin/env python3
import os
import pandas as pd
import subprocess
PANDOC = "pandoc"
TABLE_FNAME = "table.csv"
TABLE_DATA="""\
Item #/offset,Description/name,Random info,Default value
0,Sum Difficult Copper ,Recover; sculpture,0
1,Joy Committee Tick ,Reference; ball average can loop,0
"""
YAML_FNAME = "pandoc_style.yaml"
YAML_DATA = r"""
pdf-engine: xelatex
filters:
- pandoc-crossref
- citeproc
metadata:
link-citations: true # works fine here!
listings: true
variables:
geometry: margin=2cm
classoption: table
documentclass: extarticle
numbersections: true
papersize: a4
fontsize: 12pt
unicode-math: bold-style=ISO
listings: true
header-includes: |
\rowcolors{2}{gray!10}{gray!25}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{CMU Serif}
\unimathsetup{bold-style=ISO}
\lstset{% for listings
basicstyle=\ttfamily,
breaklines=true,
postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space},
}
"""
CONTENT_MD_FNAME="my_table_description.md"
CONTENT_TEX_FNAME="my_table_description.tex"
CONTENT_PDF_FNAME="my_table_description.pdf"
CONTENT_MD_DATA="""\
---
title: "My table description document"
author: John Doe
---
Here is the table, discussed in this document:
{}
Notice the table formatting!
"""
with open(TABLE_FNAME, 'w', encoding='utf-8') as f:
f.write(TABLE_DATA)
with open(YAML_FNAME, 'w', encoding='utf-8') as f:
f.write(YAML_DATA)
df_table = pd.read_csv(TABLE_FNAME)
# make these columns "monospace"/"code" with backticks
df_table["Item #/offset"] = df_table["Item #/offset"].apply(lambda x: "`{}`".format(x))
df_table["Description/name"] = df_table["Description/name"].apply(lambda x: '"`{}`"'.format(x.strip()))
table_md = df_table.to_markdown(tablefmt="grid", index=False)
content_md = CONTENT_MD_DATA.format(table_md)
with open(CONTENT_MD_FNAME, 'w', encoding='utf-8') as f:
f.write(content_md)
#print(content_md)
print("Saved {}".format(CONTENT_MD_FNAME))
#pandoc -s MANUAL.txt -o example4.tex
DEFAULTS_ARG = "--defaults={}".format(YAML_FNAME)
PANDOC_CMD1 = [PANDOC, DEFAULTS_ARG, "--toc", "-s", CONTENT_MD_FNAME, "-o", CONTENT_TEX_FNAME]
print("Running: {}".format(" ".join(PANDOC_CMD1)))
result = subprocess.run(PANDOC_CMD1, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, cwd=os.getcwd())
print(result.stderr)
if (result.returncode != 0):
print(" Got err '{}' ...".format(result.stderr))
PANDOC_CMD2 = [PANDOC, DEFAULTS_ARG, "--toc", "--verbose", CONTENT_MD_FNAME, "-o", CONTENT_PDF_FNAME]
print("Running: {}".format(" ".join(PANDOC_CMD2)))
result = subprocess.run(PANDOC_CMD2, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, cwd=os.getcwd(), encoding='UTF-8')
prn_nextline = False
for line in result.stderr.splitlines():
if (("written" in line) or ("makePDF" in line) or (prn_nextline)):
print(line)
if (prn_nextline):
prn_nextline = False
if (("temp dir:" in line) or ("Command line:" in line)):
prn_nextline = True
if (result.returncode != 0):
print(" Got err '{}' ...".format(result.stderr))
当我运行它时,我得到以下输出:
$ python3 table_to_md.py
Saved my_table_description.md
Running: pandoc --defaults=pandoc_style.yaml --toc -s my_table_description.md -o my_table_description.tex
Running: pandoc --defaults=pandoc_style.yaml --toc --verbose my_table_description.md -o my_table_description.pdf
[makePDF] temp dir:
C:/msys64/tmp/tex2pdf.-32afd74f419dcabc
[makePDF] Command line:
xelatex "-halt-on-error" "-interaction" "nonstopmode" "-output-directory" "D:/msys64/tmp/tex2pdf.-32afd74f419dcabc" "C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.tex"
[makePDF] Environment:
[makePDF] Source:
[makePDF] Run #1
Output written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.pdf (1 page).
Transcript written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.log.
[makePDF] Run #2
Output written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.pdf (1 page).
Transcript written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.log.
[makePDF] Run #3
Output written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.pdf (1 page).
Transcript written on C:/msys64/tmp/tex2pdf.-32afd74f419dcabc/input.log.
...为了完整起见,这里是生成的my_table_description.tex
:
% Options for packages loaded elsewhere
\PassOptionsToPackage{unicode}{hyperref}
\PassOptionsToPackage{hyphens}{url}
%
\documentclass[
12pt,
a4paper,
table]{extarticle}
\usepackage{amsmath,amssymb}
\usepackage{lmodern}
\usepackage{iftex}
\ifPDFTeX
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{textcomp} % provide euro and other symbols
\else % if luatex or xetex
\usepackage{unicode-math}
\defaultfontfeatures{Scale=MatchLowercase}
\defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
\fi
% Use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\IfFileExists{microtype.sty}{% use microtype if available
\usepackage[]{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
\KOMAoptions{parskip=half}}
\makeatother
\usepackage{xcolor}
\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}}
\hypersetup{
pdftitle={My table description document},
pdfauthor={John Doe},
hidelinks,
pdfcreator={LaTeX via pandoc}}
\urlstyle{same} % disable monospaced font for URLs
\usepackage[margin=2cm]{geometry}
\usepackage{listings}
\newcommand{\passthrough}[1]{#1}
\lstset{defaultdialect=[5.3]Lua}
\lstset{defaultdialect=[x86masm]Assembler}
\usepackage{longtable,booktabs,array}
\usepackage{calc} % for calculating minipage widths
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{5}
\rowcolors{2}{gray!10}{gray!25}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{CMU Serif}
\unimathsetup{bold-style=ISO}
\lstset{% for listings
basicstyle=\ttfamily,
breaklines=true,
postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space},
}
\ifLuaTeX
\usepackage{selnolig} % disable illegal ligatures
\fi
\title{My table description document}
\author{John Doe}
\date{}
\begin{document}
\maketitle
{
\setcounter{tocdepth}{3}
\tableofcontents
}
Here is the table, discussed in this document:
\begin{longtable}[]{@{}
>{\raggedright\arraybackslash}p{(\columnwidth - 6\tabcolsep) * \real{0.18}}
>{\raggedright\arraybackslash}p{(\columnwidth - 6\tabcolsep) * \real{0.28}}
>{\raggedright\arraybackslash}p{(\columnwidth - 6\tabcolsep) * \real{0.36}}
>{\raggedright\arraybackslash}p{(\columnwidth - 6\tabcolsep) * \real{0.18}}@{}}
\toprule
Item \#/offset & Description/name & Random info & Default value \\
\midrule
\endhead
\passthrough{\lstinline!0!} &
``\passthrough{\lstinline!Sum Difficult Copper!}'' & Recover; sculpture
& \begin{minipage}[t]{\linewidth}\raggedright
\begin{lstlisting}
0
\end{lstlisting}
\end{minipage} \\
\passthrough{\lstinline!1!} &
``\passthrough{\lstinline!Joy Committee Tick!}'' & Reference; ball
average can loop & \begin{minipage}[t]{\linewidth}\raggedright
\begin{lstlisting}
0
\end{lstlisting}
\end{minipage} \\
\bottomrule
\end{longtable}
Notice the table formatting!
\end{document}
答案1
我认为这不是在 markdown 文档中将 CSV 显示为表格的更简单方法。如果您使用的是 Quarto,您可以直接在主文档中使用 Python 来实现(我不知道下面的示例是否是最佳方法,我对这种语言的经验很少,抱歉)或使用 R 来实现(我只展示了该kable
方法,但您也可以使用xtable
其他方法),或者如果您只将其导出到 LateX,也可以使用 LaTeX 方法将 CSV 导入 LaTeX 表,而无需先前导出到 markdown(这里没有显示,但本网站在这方面有几个答案)。但是,您也可以简单地手动修复您获得的内容,以尽可能保持文档简单。无论如何,换行是正确的,避免了逐字文本。
请注意,由于某种原因,彩色背景比 booktabs 规则更宽(如您在自己的屏幕截图中所示)。我没有时间在 LateX 输出中查看原因,但恕我直言,最好避免使用带有 booktabs 规则的颜色,这也会添加垂直空格。这也会在垂直规则中造成严重的中断,无论如何您都应该始终避免这种情况,正如 @cfr 所评论的那样。
MWE用Quarto编译:
---
format: pdf
---
### With Python
```{python}
#| echo: false
import pandas as pd
df = pd.read_csv("foo.csv")
from IPython.display import display, Markdown
Markdown(df.to_markdown())
```
(Sorry, no idea of how remove row names in python)
### With R
```{r}
#| echo: false
#| output: asis
df <- read.csv(file = "foo.csv",header=FALSE)
names(df) <- unname(unlist(df[1,]))
df <- df[-1,]
knitr::kable(df, row.names=FALSE,align="cllc")
```
### With Markdown (edited manually)
```{=tex}
\tabcolsep0pt
```
+---------------+--------------------------+----------------------------------+---------------+
| Item #/offset | Description/name | Random info | Default value |
+:=============:+==========================+==================================+:=============:+
| 0 | Sum Difficult Copper | Recover; sculpture | 0 |
+---------------+--------------------------+----------------------------------+---------------+
| 1 | Joy Committee Tick | Reference; ball average can loop | 0 |
+---------------+--------------------------+----------------------------------+---------------+
答案2
好吧 - 我找到了一种基于两项改变的解决方案,其中没有一项与 Latex 直接相关:
- 在
.csv
, 中使用换行符强制列名称断开
查看 OP,我们可以看到列标签“项目编号/偏移量”和“默认值”比它们的典型内容长得多 -> 这最终导致网格表 Markdown 格式 -> 最终决定 Latex/PDF 列的相对宽度。
因此,在中插入如下换行符.csv
:
"Item #
/offset",Description/name,Random info,"Default
value"
0,Sum Difficult Copper ,Recover; sculpture,0
1,Joy Committee Tick ,Reference; ball average can loop,0
... 结果如下:
+-----------+--------------------------+----------------------------------+-----------+
| Item # | Description/name | Random info | Default |
| /offset | | | value |
+===========+==========================+==================================+===========+
| `0` | "`Sum Difficult Copper`" | Recover; sculpture | 0 |
+-----------+--------------------------+----------------------------------+-----------+
| `1` | "`Joy Committee Tick`" | Reference; ball average can loop | 0 |
+-----------+--------------------------+----------------------------------+-----------+
- 在 Python 中的调用中使用
colalign
andnumalign
.to_markdown
我注意到pandas.DataFrame.to_markdown — pandas 2.1.1 文档:
**kwargs
:这些参数将传递给制表。
... 并且 tabulate 具有colalign
和numalign
参数。它会自动检测数字,并且数字的默认样式是“正确” - 但在这种情况下,数字往往会“粘”在底部。
numalign="left", colalign=("left",)
因此,对我来说,解决方案是在调用中使用.to_markdown
;这会使最后一列的数字左对齐(上表也显示了这种效果) - 最后,得到了更紧密的自动表格布局:
这是修改后的版本table_to_md.py
(如果在 Windows 上,请确保使用 Unix 行尾保存它,\n
例如在 Notepad++ 中,否则脚本做出的一些假设将会被破坏):
#!/usr/bin/env python3
import os
import pandas as pd
import subprocess
PANDOC = "pandoc"
TABLE_FNAME = "table.csv"
TABLE_DATA="""\
"Item #
/offset",Description/name,Random info,"Default
value"
0,Sum Difficult Copper ,Recover; sculpture,0
1,Joy Committee Tick ,Reference; ball average can loop,0
"""
YAML_FNAME = "pandoc_style.yaml"
YAML_DATA = r"""
pdf-engine: xelatex
filters:
- pandoc-crossref
- citeproc
metadata:
link-citations: true # works fine here!
listings: true
variables:
geometry: margin=2cm
classoption: table
documentclass: extarticle
numbersections: true
papersize: a4
fontsize: 12pt
unicode-math: bold-style=ISO
listings: true
header-includes: |
\rowcolors{2}{gray!10}{gray!25}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{CMU Serif}
\unimathsetup{bold-style=ISO}
\lstset{% for listings
basicstyle=\ttfamily,
breaklines=true,
postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space},
}
"""
CONTENT_MD_FNAME="my_table_description.md"
CONTENT_TEX_FNAME="my_table_description.tex"
CONTENT_PDF_FNAME="my_table_description.pdf"
CONTENT_MD_DATA="""\
---
title: "My table description document"
author: John Doe
---
Here is the table, discussed in this document:
{}
Notice the table formatting!
"""
with open(TABLE_FNAME, 'wb') as f:
f.write(bytes(TABLE_DATA, encoding='utf-8'))
with open(YAML_FNAME, 'w', encoding='utf-8') as f:
f.write(YAML_DATA)
df_table = pd.read_csv(TABLE_FNAME)
print(df_table.columns)
# make these columns "monospace"/"code" with backticks
df_table["Item #\n/offset"] = df_table["Item #\n/offset"].apply(lambda x: "`{}`".format(x))
df_table["Description/name"] = df_table["Description/name"].apply(lambda x: '"`{}`"'.format(x.strip()))
table_md = df_table.to_markdown(tablefmt="grid", index=False, numalign="left", colalign=("left",))
content_md = CONTENT_MD_DATA.format(table_md)
with open(CONTENT_MD_FNAME, 'w', encoding='utf-8') as f:
f.write(content_md)
#print(content_md)
print("Saved {}".format(CONTENT_MD_FNAME))
#pandoc -s MANUAL.txt -o example4.tex
DEFAULTS_ARG = "--defaults={}".format(YAML_FNAME)
PANDOC_CMD1 = [PANDOC, DEFAULTS_ARG, "--toc", "-s", CONTENT_MD_FNAME, "-o", CONTENT_TEX_FNAME]
print("Running: {}".format(" ".join(PANDOC_CMD1)))
result = subprocess.run(PANDOC_CMD1, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, cwd=os.getcwd())
print(result.stderr)
if (result.returncode != 0):
print(" Got err '{}' ...".format(result.stderr))
PANDOC_CMD2 = [PANDOC, DEFAULTS_ARG, "--toc", "--verbose", CONTENT_MD_FNAME, "-o", CONTENT_PDF_FNAME]
print("Running: {}".format(" ".join(PANDOC_CMD2)))
result = subprocess.run(PANDOC_CMD2, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, cwd=os.getcwd(), encoding='UTF-8')
prn_nextline = False
for line in result.stderr.splitlines():
if (("written" in line) or ("makePDF" in line) or (prn_nextline)):
print(line)
if (prn_nextline):
prn_nextline = False
if (("temp dir:" in line) or ("Command line:" in line)):
prn_nextline = True
if (result.returncode != 0):
print(" Got err '{}' ...".format(result.stderr))