读取带引号的 CSV 文件

2024-5-24 • tag-icon

babel catcodes csvsimple

读取带引号的 CSV 文件

我正在尝试将 CSV 文件读入 latex，并为文件中的每一行生成一个新页面。不幸的是，该文件引用了每个字段，使用分号作为分隔符，并在标题名称中包含下划线。我已经想出了如何处理后两个问题。为了解决引用问题，似乎有必要更改引号的 catcode 以忽略它们（如果其中一个引用的字段中有分号，这对我来说似乎有点危险，但那是另一回事）。

但是，当我尝试使用该ngerman选项加载 babel 包时，以下 MWE 无法编译，从而导致出现错误消息

! Improper alphabetic constant.
<to be read again> 
                   \active 
l.2 \catcode `"\active

?

注释掉有问题的行可以使示例编译顺利进行。

\begin{filecontents*}{\jobname.csv}
  "ID";"USER_NAME"
  "1";"Foo Bar"
  "2";"Baz Qüx"
\end{filecontents*}

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[ngerman]{babel} % <- MWE compiles without this line

\usepackage{csvsimple}

\newcommand*{\makepage}[2]{
  ID: #1 \\
  User: #2 \newpage
}

\begin{document}

\csvreader[
  head to column names,
  /csv/separator=semicolon,
  before reading={\catcode`\"=9}
]{\jobname.csv}{USER_NAME=\username}{\makepage{\ID}{\username}}

\end{document}

编辑：下面建议的解决方案修复了编译错误。但是，当应用修复程序在 tabularx 环境中显示 CSV 文件的行时，第一行之后的任何条目现在都带有引号。

\begin{filecontents*}{\jobname.csv}
  "ID";"FIRST_NAME";"FAMILY_NAME"
  "1";"Foo";"Bar"
  "2";"Baz";"Qüx"
  "3";"Quux";"Quuz Corge"
\end{filecontents*}

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{amssymb}
\usepackage[ngerman]{babel}
\usepackage{tabularx}
\usepackage{csvsimple}

\begin{document}

\begin{tabularx}{\linewidth}{|l|l|l|c|X|}
  \hline
  \textbf{\#} & \textbf{ID} & \textbf{Name} & \textbf{Check} &
  \textbf{Comment}
  \csvreader[
    head to column names,
    /csv/separator=semicolon,
    before reading={\catcode`\"=9},
    after reading={\catcode`\"=13}
  ]{\jobname.csv}{
    ID=\userid,
    FIRST_NAME=\firstname,
    FAMILY_NAME=\familyname
  }{\\\hline \thecsvrow & \userid & \firstname~\familyname & $\square$ &}
  \\\hline
\end{tabularx}

\end{document}

答案1

发生这种情况是因为babel在文件中写入.aux内容（执行时读取\end{document}）：

\catcode`"\active

但是你使用了\catcode`\"=9，因此本质上变成了：

\catcode`\active

这会引发Improper alphabetic constant错误。\catcode更改不在任何组内进行，因此其效果将持续到再次更改为止。

如果在:babel之前使用反斜杠，就不会发生这种情况。"\catcode`\"\active

要解决这个问题，你必须在"读取文件后“重新激活”。你可以使用密钥after reading（并确保使用\catcode`\"=13带有反斜杠的）：

\begin{filecontents*}{\jobname.csv}
  "ID";"USER_NAME"
  "1";"Foo Bar"
  "2";"Baz Qüx"
\end{filecontents*}

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[ngerman]{babel} % <- MWE compiles without this line

\usepackage{csvsimple}

\newcommand*{\makepage}[2]{
  ID: #1 \\
  User: #2 \newpage
}

\begin{document}

\csvreader[
  head to column names,
  /csv/separator=semicolon,
  before reading={\catcode`\"=9},
  after reading={\catcode`\"=13},
]{\jobname.csv}{USER_NAME=\username}{\makepage{\ID}{\username}}

\end{document}

答案2

我通过更多的尝试和错误找到了答案。我没有使用before reading和after reading选项，而是将环境包装tabularx在其自己的组中，并更改了 catcode。这似乎解决了我所有的问题。也许有人可以解释一下另一种方法在这里不起作用的原因，但目前我很满意。

\begin{filecontents*}{\jobname.csv}
  "ID";"FIRST_NAME";"FAMILY_NAME"
  "1";"Foo";"Bar"
  "2";"Baz";"Qüx"
  "3";"Quux";"Quuz Corge"
\end{filecontents*}

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{amssymb}
\usepackage[ngerman]{babel}
\usepackage{tabularx}
\usepackage{csvsimple}

\begin{document}

\begingroup\catcode`"=9
\begin{tabularx}{\linewidth}{|l|l|l|c|X|}
  \hline
  \textbf{\#} & \textbf{ID} & \textbf{Name} & \textbf{Check} &
  \textbf{Comment}
  \csvreader[
    head to column names,
    /csv/separator=semicolon
  ]{\jobname.csv}{
    ID=\userid,
    FIRST_NAME=\firstname,
    FAMILY_NAME=\familyname
  }{\\\hline \thecsvrow & \userid & \firstname~\familyname & $\square$ &}
  \\\hline
\end{tabularx}
\endgroup

\end{document}

相关内容