将 CSV 读取为交叉表/数据透视表

将 CSV 读取为交叉表/数据透视表

我有一张如下所示的表格:

值交叉表

注意:此表是由我的前任在 winword 中生成的,因此我没有它的 MWE。我尝试这样做,但失败了,因为对表进行基于行的操作显然不是那么简单。

我可以完全控制底层的 CSV/TSV 数据,并且我正在尝试生成 LaTeX 格式的表格版本,以便在数据发生变化时每年更新。

我可以使用 Excel 中的数据透视表获得类似的结果(请参见下文,但我不知道如何将其转换为 LaTeX,除非将 py 脚本放入我的 makefile 中以动态生成表格。有没有纯 LaTeX 方式可以做到这一点?

在此处输入图片描述

编辑:这是上表的 MVE:

\documentclass{article}
\usepackage{tabularx}
\usepackage{booktabs}
\usepackage{amsmath}
    
\begin{document}
\begin{tabular}{rrrr|rrr}
\cmidrule{2-7}
&\multicolumn{3}{c|}{2019}& \multicolumn{3}{c}{2020}\\
&Applied&Admitted&\%&Applied&Admitted&\%\\
\cmidrule{2-7}
Resident Freshmen      & 1143 &  918 & 80.3\% & 1094 & 1003 & 91.7\%\\
Non-Resident Freshmen  & 1371 & 1048 & 76.4\% & 1223 & 1090 & 89.1\%\\
\textbf{Total Freshmen} & 2514 & 1966 & 78.2\% & 2317 & 2093 & 90.3\%\\
\cmidrule{2-7}
Resident Transfer       & 404 & 358 & 88.6\% & 406 & 354 & 87.2\%\\
Non-Resident Transfer   & 371 & 313 & 84.4\% & 356 & 288 & 80.9\%\\
\textbf{Total Transfer} & 775 & 671 & 86.6\% & 762 & 642 & 84.3\%\\
\cmidrule{2-7}
Grad/PostBacc & 418 & 311 & 74.4\% & 374 & 282 & 75.4\%\\
\cmidrule{2-7}
\textbf{Grand Total} & 3707 & 2948 & 79.5\% & 3453 & 3017 & 87.4\%\\
\cmidrule{2-7}
%\bottomrule
\end{tabular}   
\end{document}

上述代码的编译形式

我的 CSV 数据是可变的(因为我控制生成它的数据库),但至少我认为我需要这样的东西

year, decision, residency, class, count
...
2019,admitted,resident,freshmen, 1143
...
2020, applied,, Grad/PostBacc, 374

答案1

我不知道如何在纯 TeX 中做到这一点,所以这里有一个包含各种程序的工具链可以解决这个问题:

Directory structure:
/
+-Data
| +-app_admit_fall.csv
+-Templates
| +-app_admit_fall.pytex
+-Generated
| +-app_admit_fall.tex
+-Makefile
+-generate_file.py
+-document.tex
+-document.pdf
# Makefile
document.pdf: document.tex app_admit_fall.tex
    rm document.pdf
    rubber --pdf document.tex

# Dependent on the ETL, the data, and the formatting. 
app_admit_fall.tex : generate_file.py Data/app_admit_fall.csv Templates/app_admit_fall.pytex
    python3 generate_file.py app_admit_fall

基础文件:

% document.tex
\documentclass{article}
\usepackage{booktabs}
\begin{document}
\input{Generated/app_admit_fall.tex}
\end{document}

这是 Python 用于 tex 文件的“模板”文件。您可以随意生成它。就我自己而言,我使用这种格式,@@{n}因为它几乎肯定不会在“普通”TeX 中找到

% Templates/app_admit_fall.pytex
\begin{tabular}{rrrr|rrr}
    \cmidrule{2-7}
    &\multicolumn{3}{c|}{@@{1}}& \multicolumn{3}{c}{@@{10}}\\
    &Applied&Admitted&\%&Applied&Admitted&\%\\
    \cmidrule{2-7}
    Resident Freshmen      & @@{2} & @@{11}\\
    Non-Resident Freshmen  & @@{3} & @@{12}\\
    \textbf{Total Freshmen} & @@{4} & @@{13}\\
    \cmidrule{2-7}
    Resident Transfer       & @@{5} & @@{14}\\
    Non-Resident Transfer   & @@{6} & @@{15}\\
    \textbf{Total Transfer} & @@{7} & @@{16}\\
    \cmidrule{2-7}
    Grad/PostBacc & @@{8} & @@{17}\\
    \cmidrule{2-7}
    \textbf{Grand Total} & @@{9} & @@{18}\\
    \cmidrule{2-7}
\end{tabular}
#!python3
#generate_file.py
### Written by Jakob Lovern
### This file contains functions to generate static tables from the .pytex
### templates in /Templates and the data in /Data. It is *NOT* automatic. 
import sys
import pandas as pd

# Fill the applications/admissions table from app_admit_fall.csv
# Very quick n dirty
def app_admit_fall():
    df = pd.read_csv("Data/app_admit_fall.csv")

    count = 1
    replace_strings = dict()
    def comp_string(input):
        applied = sum(input.query("Decision == 'Applied'")["Count"])
        admitted = sum(input.query("Decision == 'Admitted'")["Count"])
        pct = "%.1f"%(100*admitted/applied)
        return f"{applied} & {admitted} & {pct}\%"
    for year in df["Year"].drop_duplicates().sort_values():
        replace_strings[count] = year
        this_year = df.query("Year == @year")
        class_code = this_year.query("Class=='Freshmen'")
        replace_strings[count+1] = comp_string(class_code.query("Residency=='Resident'"))
        replace_strings[count+2] = comp_string(class_code.query("Residency=='Non-Resident'"))
        replace_strings[count+3] = comp_string(class_code)
        class_code = this_year.query("Class=='Transfer'")
        replace_strings[count+4] = comp_string(class_code.query("Residency=='Resident'"))
        replace_strings[count+5] = comp_string(class_code.query("Residency=='Non-Resident'"))
        replace_strings[count+6] = comp_string(class_code)
        replace_strings[count+7] = comp_string(this_year.query("Class=='Grad/Postbacc'"))

        replace_strings[count+8] = comp_string(this_year)
        count += 9
    print(replace_strings)
    with open("Templates/app_admit_fall.pytex","r") as template:
        filedata = template.read()
    for key, value in replace_strings.items():
        filedata = filedata.replace(f'@@{{{key}}}',str(value))
    with open("Generated/app_admit_fall.tex","w+") as generated:
        generated.write(filedata)

# Add more generators for other files here
generators = {
    'app_admit_fall': app_admit_fall,
}

# Using generators allows us to call specific files from Make,
# which means that we can re-generate only changed files. 
if __name__ == '__main__':
    if len(sys.argv)<2:
        exit(1)
    elif sys.argv[1] in generators:
        generators[sys.argv[1]]()
    else:
        exit(1)

并且,出于 MVE 目的,这是我的 CSV(Data/app_admit_fall.csv):

Year,Decision,Residency,Class,Count
2019,Applied,Resident,Freshmen,1143
2019,Applied,Resident,Transfer,404
2019,Applied,,Grad/Postbacc,418
2019,Applied,Non-Resident,Freshmen,1371
2019,Applied,Non-Resident,Transfer,371
2019,Admitted,Resident,Freshmen,918
2019,Admitted,Resident,Transfer,358
2019,Admitted,,Grad/Postbacc,311
2019,Admitted,Non-Resident,Freshmen,1048
2019,Admitted,Non-Resident,Transfer,313
2020,Applied,Resident,Freshmen,1094
2020,Applied,Resident,Transfer,406
2020,Applied,,Grad/Postbacc,374
2020,Applied,Non-Resident,Freshmen,1223
2020,Applied,Non-Resident,Transfer,356
2020,Admitted,Resident,Freshmen,1003
2020,Admitted,Resident,Transfer,354
2020,Admitted,,Grad/Postbacc,282
2020,Admitted,Non-Resident,Freshmen,1090
2020,Admitted,Non-Resident,Transfer,288

相关内容