将 CSV 读取为交叉表/数据透视表

Question

我不知道如何在纯 TeX 中做到这一点，所以这里有一个包含各种程序的工具链可以解决这个问题：

Directory structure:
/
+-Data
| +-app_admit_fall.csv
+-Templates
| +-app_admit_fall.pytex
+-Generated
| +-app_admit_fall.tex
+-Makefile
+-generate_file.py
+-document.tex
+-document.pdf

# Makefile
document.pdf: document.tex app_admit_fall.tex
    rm document.pdf
    rubber --pdf document.tex

# Dependent on the ETL, the data, and the formatting. 
app_admit_fall.tex : generate_file.py Data/app_admit_fall.csv Templates/app_admit_fall.pytex
    python3 generate_file.py app_admit_fall

基础文件：

% document.tex
\documentclass{article}
\usepackage{booktabs}
\begin{document}
\input{Generated/app_admit_fall.tex}
\end{document}

这是 Python 用于 tex 文件的“模板”文件。您可以随意生成它。就我自己而言，我使用这种格式，@@{n}因为它几乎肯定不会在“普通”TeX 中找到

% Templates/app_admit_fall.pytex
\begin{tabular}{rrrr|rrr}
    \cmidrule{2-7}
    &\multicolumn{3}{c|}{@@{1}}& \multicolumn{3}{c}{@@{10}}\\
    &Applied&Admitted&\%&Applied&Admitted&\%\\
    \cmidrule{2-7}
    Resident Freshmen      & @@{2} & @@{11}\\
    Non-Resident Freshmen  & @@{3} & @@{12}\\
    \textbf{Total Freshmen} & @@{4} & @@{13}\\
    \cmidrule{2-7}
    Resident Transfer       & @@{5} & @@{14}\\
    Non-Resident Transfer   & @@{6} & @@{15}\\
    \textbf{Total Transfer} & @@{7} & @@{16}\\
    \cmidrule{2-7}
    Grad/PostBacc & @@{8} & @@{17}\\
    \cmidrule{2-7}
    \textbf{Grand Total} & @@{9} & @@{18}\\
    \cmidrule{2-7}
\end{tabular}

#!python3
#generate_file.py
### Written by Jakob Lovern
### This file contains functions to generate static tables from the .pytex
### templates in /Templates and the data in /Data. It is *NOT* automatic. 
import sys
import pandas as pd

# Fill the applications/admissions table from app_admit_fall.csv
# Very quick n dirty
def app_admit_fall():
    df = pd.read_csv("Data/app_admit_fall.csv")

    count = 1
    replace_strings = dict()
    def comp_string(input):
        applied = sum(input.query("Decision == 'Applied'")["Count"])
        admitted = sum(input.query("Decision == 'Admitted'")["Count"])
        pct = "%.1f"%(100*admitted/applied)
        return f"{applied} & {admitted} & {pct}\%"
    for year in df["Year"].drop_duplicates().sort_values():
        replace_strings[count] = year
        this_year = df.query("Year == @year")
        class_code = this_year.query("Class=='Freshmen'")
        replace_strings[count+1] = comp_string(class_code.query("Residency=='Resident'"))
        replace_strings[count+2] = comp_string(class_code.query("Residency=='Non-Resident'"))
        replace_strings[count+3] = comp_string(class_code)
        class_code = this_year.query("Class=='Transfer'")
        replace_strings[count+4] = comp_string(class_code.query("Residency=='Resident'"))
        replace_strings[count+5] = comp_string(class_code.query("Residency=='Non-Resident'"))
        replace_strings[count+6] = comp_string(class_code)
        replace_strings[count+7] = comp_string(this_year.query("Class=='Grad/Postbacc'"))

        replace_strings[count+8] = comp_string(this_year)
        count += 9
    print(replace_strings)
    with open("Templates/app_admit_fall.pytex","r") as template:
        filedata = template.read()
    for key, value in replace_strings.items():
        filedata = filedata.replace(f'@@{{{key}}}',str(value))
    with open("Generated/app_admit_fall.tex","w+") as generated:
        generated.write(filedata)

# Add more generators for other files here
generators = {
    'app_admit_fall': app_admit_fall,
}

# Using generators allows us to call specific files from Make,
# which means that we can re-generate only changed files. 
if __name__ == '__main__':
    if len(sys.argv)<2:
        exit(1)
    elif sys.argv[1] in generators:
        generators[sys.argv[1]]()
    else:
        exit(1)

并且，出于 MVE 目的，这是我的 CSV（Data/app_admit_fall.csv）：

Year,Decision,Residency,Class,Count
2019,Applied,Resident,Freshmen,1143
2019,Applied,Resident,Transfer,404
2019,Applied,,Grad/Postbacc,418
2019,Applied,Non-Resident,Freshmen,1371
2019,Applied,Non-Resident,Transfer,371
2019,Admitted,Resident,Freshmen,918
2019,Admitted,Resident,Transfer,358
2019,Admitted,,Grad/Postbacc,311
2019,Admitted,Non-Resident,Freshmen,1048
2019,Admitted,Non-Resident,Transfer,313
2020,Applied,Resident,Freshmen,1094
2020,Applied,Resident,Transfer,406
2020,Applied,,Grad/Postbacc,374
2020,Applied,Non-Resident,Freshmen,1223
2020,Applied,Non-Resident,Transfer,356
2020,Admitted,Resident,Freshmen,1003
2020,Admitted,Resident,Transfer,354
2020,Admitted,,Grad/Postbacc,282
2020,Admitted,Non-Resident,Freshmen,1090
2020,Admitted,Non-Resident,Transfer,288

Answer 1

我不知道如何在纯 TeX 中做到这一点，所以这里有一个包含各种程序的工具链可以解决这个问题：

Directory structure:
/
+-Data
| +-app_admit_fall.csv
+-Templates
| +-app_admit_fall.pytex
+-Generated
| +-app_admit_fall.tex
+-Makefile
+-generate_file.py
+-document.tex
+-document.pdf

# Makefile
document.pdf: document.tex app_admit_fall.tex
    rm document.pdf
    rubber --pdf document.tex

# Dependent on the ETL, the data, and the formatting. 
app_admit_fall.tex : generate_file.py Data/app_admit_fall.csv Templates/app_admit_fall.pytex
    python3 generate_file.py app_admit_fall

基础文件：

% document.tex
\documentclass{article}
\usepackage{booktabs}
\begin{document}
\input{Generated/app_admit_fall.tex}
\end{document}

这是 Python 用于 tex 文件的“模板”文件。您可以随意生成它。就我自己而言，我使用这种格式，@@{n}因为它几乎肯定不会在“普通”TeX 中找到

% Templates/app_admit_fall.pytex
\begin{tabular}{rrrr|rrr}
    \cmidrule{2-7}
    &\multicolumn{3}{c|}{@@{1}}& \multicolumn{3}{c}{@@{10}}\\
    &Applied&Admitted&\%&Applied&Admitted&\%\\
    \cmidrule{2-7}
    Resident Freshmen      & @@{2} & @@{11}\\
    Non-Resident Freshmen  & @@{3} & @@{12}\\
    \textbf{Total Freshmen} & @@{4} & @@{13}\\
    \cmidrule{2-7}
    Resident Transfer       & @@{5} & @@{14}\\
    Non-Resident Transfer   & @@{6} & @@{15}\\
    \textbf{Total Transfer} & @@{7} & @@{16}\\
    \cmidrule{2-7}
    Grad/PostBacc & @@{8} & @@{17}\\
    \cmidrule{2-7}
    \textbf{Grand Total} & @@{9} & @@{18}\\
    \cmidrule{2-7}
\end{tabular}

#!python3
#generate_file.py
### Written by Jakob Lovern
### This file contains functions to generate static tables from the .pytex
### templates in /Templates and the data in /Data. It is *NOT* automatic. 
import sys
import pandas as pd

# Fill the applications/admissions table from app_admit_fall.csv
# Very quick n dirty
def app_admit_fall():
    df = pd.read_csv("Data/app_admit_fall.csv")

    count = 1
    replace_strings = dict()
    def comp_string(input):
        applied = sum(input.query("Decision == 'Applied'")["Count"])
        admitted = sum(input.query("Decision == 'Admitted'")["Count"])
        pct = "%.1f"%(100*admitted/applied)
        return f"{applied} & {admitted} & {pct}\%"
    for year in df["Year"].drop_duplicates().sort_values():
        replace_strings[count] = year
        this_year = df.query("Year == @year")
        class_code = this_year.query("Class=='Freshmen'")
        replace_strings[count+1] = comp_string(class_code.query("Residency=='Resident'"))
        replace_strings[count+2] = comp_string(class_code.query("Residency=='Non-Resident'"))
        replace_strings[count+3] = comp_string(class_code)
        class_code = this_year.query("Class=='Transfer'")
        replace_strings[count+4] = comp_string(class_code.query("Residency=='Resident'"))
        replace_strings[count+5] = comp_string(class_code.query("Residency=='Non-Resident'"))
        replace_strings[count+6] = comp_string(class_code)
        replace_strings[count+7] = comp_string(this_year.query("Class=='Grad/Postbacc'"))

        replace_strings[count+8] = comp_string(this_year)
        count += 9
    print(replace_strings)
    with open("Templates/app_admit_fall.pytex","r") as template:
        filedata = template.read()
    for key, value in replace_strings.items():
        filedata = filedata.replace(f'@@{{{key}}}',str(value))
    with open("Generated/app_admit_fall.tex","w+") as generated:
        generated.write(filedata)

# Add more generators for other files here
generators = {
    'app_admit_fall': app_admit_fall,
}

# Using generators allows us to call specific files from Make,
# which means that we can re-generate only changed files. 
if __name__ == '__main__':
    if len(sys.argv)<2:
        exit(1)
    elif sys.argv[1] in generators:
        generators[sys.argv[1]]()
    else:
        exit(1)

并且，出于 MVE 目的，这是我的 CSV（Data/app_admit_fall.csv）：

Year,Decision,Residency,Class,Count
2019,Applied,Resident,Freshmen,1143
2019,Applied,Resident,Transfer,404
2019,Applied,,Grad/Postbacc,418
2019,Applied,Non-Resident,Freshmen,1371
2019,Applied,Non-Resident,Transfer,371
2019,Admitted,Resident,Freshmen,918
2019,Admitted,Resident,Transfer,358
2019,Admitted,,Grad/Postbacc,311
2019,Admitted,Non-Resident,Freshmen,1048
2019,Admitted,Non-Resident,Transfer,313
2020,Applied,Resident,Freshmen,1094
2020,Applied,Resident,Transfer,406
2020,Applied,,Grad/Postbacc,374
2020,Applied,Non-Resident,Freshmen,1223
2020,Applied,Non-Resident,Transfer,356
2020,Admitted,Resident,Freshmen,1003
2020,Admitted,Resident,Transfer,354
2020,Admitted,,Grad/Postbacc,282
2020,Admitted,Non-Resident,Freshmen,1090
2020,Admitted,Non-Resident,Transfer,288

将 CSV 读取为交叉表/数据透视表

答案1

相关内容