查找 .csv 文件中空白行的位置

Question 1

使用 xstring 包可能会有帮助：

\documentclass[a4paper,12pt]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{filecontents,xstring}
\begin{filecontents}{data.csv}
"Amplitude","notes: data set 1",
X,Y,
1,1,
2,2,
3,3,
4,4,

"Amplitude","notes: data set 2",
X,Y,
1,7,
2,6,
3,5,
4,4,
5,3,
6,2,
7,1,

"Amplitude","notes: data set 3",
X,Y,
1,2
3,4
5,6,

"CH1","notes: data set 1",
"CH1","notes: data set 2",
\end{filecontents}
\newcount\cntblanklines
\newcount\currentdataline
\begingroup
    \catcode0 12
    \begingroup\everyeof{\noexpand}\endlinechar0\xdef\datas{\csname @@input\endcsname data.csv }\endgroup
    \StrCount\datas{^^00^^00}[\nbblanklines]\global\let\nbblanklines\nbblanklines
    \loop
        \ifnum\cntblanklines<\nbblanklines\relax
        \advance\cntblanklines1
        \StrCut\datas{^^00^^00}\currentdatas\datas
        \StrCount\currentdatas{^^00}[\currentblocklength]%
        \expandafter\xdef\csname blockbegin\romannumeral\cntblanklines\endcsname{\number\numexpr\currentdataline+3}%
        \expandafter\xdef\csname   blockend\romannumeral\cntblanklines\endcsname{\number\numexpr\currentblocklength+1+\currentdataline}%
        \advance\currentdataline\numexpr\currentblocklength+2\relax
    \repeat
\endgroup%
\begin{document}
Number of blank lines : \nbblanklines

Block 1 : \blockbegini--\blockendi

Block 2 : \blockbeginii--\blockendii

Block 3 : \blockbeginiii--\blockendiii
\end{document}

Answer

使用 xstring 包可能会有帮助：

\documentclass[a4paper,12pt]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{filecontents,xstring}
\begin{filecontents}{data.csv}
"Amplitude","notes: data set 1",
X,Y,
1,1,
2,2,
3,3,
4,4,

"Amplitude","notes: data set 2",
X,Y,
1,7,
2,6,
3,5,
4,4,
5,3,
6,2,
7,1,

"Amplitude","notes: data set 3",
X,Y,
1,2
3,4
5,6,

"CH1","notes: data set 1",
"CH1","notes: data set 2",
\end{filecontents}
\newcount\cntblanklines
\newcount\currentdataline
\begingroup
    \catcode0 12
    \begingroup\everyeof{\noexpand}\endlinechar0\xdef\datas{\csname @@input\endcsname data.csv }\endgroup
    \StrCount\datas{^^00^^00}[\nbblanklines]\global\let\nbblanklines\nbblanklines
    \loop
        \ifnum\cntblanklines<\nbblanklines\relax
        \advance\cntblanklines1
        \StrCut\datas{^^00^^00}\currentdatas\datas
        \StrCount\currentdatas{^^00}[\currentblocklength]%
        \expandafter\xdef\csname blockbegin\romannumeral\cntblanklines\endcsname{\number\numexpr\currentdataline+3}%
        \expandafter\xdef\csname   blockend\romannumeral\cntblanklines\endcsname{\number\numexpr\currentblocklength+1+\currentdataline}%
        \advance\currentdataline\numexpr\currentblocklength+2\relax
    \repeat
\endgroup%
\begin{document}
Number of blank lines : \nbblanklines

Block 1 : \blockbegini--\blockendi

Block 2 : \blockbeginii--\blockendii

Block 3 : \blockbeginiii--\blockendiii
\end{document}

Question 2

\readdef我最初将（readarray包）的定义修改为不是忽略输入中的空行。但是，我在这里进行了编辑，以定义\simplereaddef它完全消除了包的开销readarray。

\documentclass[a4paper,12pt]{article}
\usepackage{pgfplots,filecontents,listofitems}
\makeatletter
\newread\readfile
\newcommand\simplereaddef[3][,]{%
  \catcode\endlinechar=9 %
  \def#3{}%
  \openin\readfile=#2%
  \loop\unless\ifeof\readfile%
    \read\readfile to\readfileline % Reads a line of the file into \readfileline%
    \expandafter\g@addto@macro\expandafter#3\expandafter{\readfileline}%
    \g@addto@macro#3{#1}% ADD record-delim TO END OF EACH RECORD
  \repeat%
  \closein\readfile%
  \catcode\endlinechar=5 %
}
\makeatother
\begin{filecontents*}{mydata.csv}
"Amplitude","notes: data set 1",
X,Y,
1,1,
2,2,
3,3,
4,4,

"Amplitude","notes: data set 2",
X,Y,
1,7,
2,6,
3,5,
4,4,
5,3,
6,2,
7,1,

"CH1","notes: data set 1",
"CH1","notes: data set 2",
\end{filecontents*}
\begin{document}
\simplereaddef[\\]{mydata.csv}\mydata
\setsepchar{\\}
\readlist\myarray{\mydata}
\noindent\foreachitem\x\in\myarray[]{%
  \ifnum\xcnt<\listlen\myarray[]\relax%
    \expandafter\ifx\expandafter\relax\x\relax Line \xcnt{} blank\\\fi%
  \fi}
\end{document}

补充

因为总体愿望不仅仅是找到文件的空白记录，而是使用它们来获取由空白记录分隔的原始文件的子列表。这里有一种方法，已编辑以使用listofitems嵌套列表。

在\sublist输出中，我还为每个子记录添加了一个前导#)，以表明子列表不仅仅是一个文本块，而且是子文本块内单独可访问的记录。

编辑以使用\simplereaddef宏，而不是从\readarray包中修改它。

\documentclass[a4paper,12pt]{article}
\usepackage{pgfplots,filecontents,pgffor,listofitems}
\makeatletter
\newread\readfile
\newcommand\simplereaddef[3][,]{%
  \catcode\endlinechar=9 %
  \def#3{}%
  \openin\readfile=#2%
  \loop\unless\ifeof\readfile%
    \read\readfile to\readfileline % Reads a line of the file into \readfileline%
    \expandafter\g@addto@macro\expandafter#3\expandafter{\readfileline}%
    \g@addto@macro#3{#1}% ADD record-delim TO END OF EACH RECORD
  \repeat%
  \closein\readfile%
  \catcode\endlinechar=5 %
}
\makeatother
\begin{filecontents*}{mydata.csv}
"Amplitude","notes: data set 1",
X,Y,
1,1,
2,2,
3,3,
4,4,

"Amplitude","notes: data set 2",
X,Y,
1,7,
2,6,
3,5,
4,4,
5,3,
6,2,
7,1,

"CH1","notes: data set 1",
"CH1","notes: data set 2",
\end{filecontents*}
\newcommand\sublist[1]{SUBLIST #1:\\\foreachitem\x\in\myarray[#1]{\xcnt) \x\\}\par}
\begin{document}
\simplereaddef[\\]{mydata.csv}\mydata% OPTIONAL ARG IS record-delim (DEFAULT ,)
\setsepchar{\\\\/\\}% OF FORM {2X record-delim / record-delim}
\ignoreemptyitems
\readlist\myarray{\mydata}
Number of non-empty sublists: \listlen\myarray[]

\sublist{1}

\sublist{3}

\sublist{2}
\end{document}

Answer

\readdef我最初将（readarray包）的定义修改为不是忽略输入中的空行。但是，我在这里进行了编辑，以定义\simplereaddef它完全消除了包的开销readarray。

\documentclass[a4paper,12pt]{article}
\usepackage{pgfplots,filecontents,listofitems}
\makeatletter
\newread\readfile
\newcommand\simplereaddef[3][,]{%
  \catcode\endlinechar=9 %
  \def#3{}%
  \openin\readfile=#2%
  \loop\unless\ifeof\readfile%
    \read\readfile to\readfileline % Reads a line of the file into \readfileline%
    \expandafter\g@addto@macro\expandafter#3\expandafter{\readfileline}%
    \g@addto@macro#3{#1}% ADD record-delim TO END OF EACH RECORD
  \repeat%
  \closein\readfile%
  \catcode\endlinechar=5 %
}
\makeatother
\begin{filecontents*}{mydata.csv}
"Amplitude","notes: data set 1",
X,Y,
1,1,
2,2,
3,3,
4,4,

"Amplitude","notes: data set 2",
X,Y,
1,7,
2,6,
3,5,
4,4,
5,3,
6,2,
7,1,

"CH1","notes: data set 1",
"CH1","notes: data set 2",
\end{filecontents*}
\begin{document}
\simplereaddef[\\]{mydata.csv}\mydata
\setsepchar{\\}
\readlist\myarray{\mydata}
\noindent\foreachitem\x\in\myarray[]{%
  \ifnum\xcnt<\listlen\myarray[]\relax%
    \expandafter\ifx\expandafter\relax\x\relax Line \xcnt{} blank\\\fi%
  \fi}
\end{document}

补充

因为总体愿望不仅仅是找到文件的空白记录，而是使用它们来获取由空白记录分隔的原始文件的子列表。这里有一种方法，已编辑以使用listofitems嵌套列表。

在\sublist输出中，我还为每个子记录添加了一个前导#)，以表明子列表不仅仅是一个文本块，而且是子文本块内单独可访问的记录。

编辑以使用\simplereaddef宏，而不是从\readarray包中修改它。

\documentclass[a4paper,12pt]{article}
\usepackage{pgfplots,filecontents,pgffor,listofitems}
\makeatletter
\newread\readfile
\newcommand\simplereaddef[3][,]{%
  \catcode\endlinechar=9 %
  \def#3{}%
  \openin\readfile=#2%
  \loop\unless\ifeof\readfile%
    \read\readfile to\readfileline % Reads a line of the file into \readfileline%
    \expandafter\g@addto@macro\expandafter#3\expandafter{\readfileline}%
    \g@addto@macro#3{#1}% ADD record-delim TO END OF EACH RECORD
  \repeat%
  \closein\readfile%
  \catcode\endlinechar=5 %
}
\makeatother
\begin{filecontents*}{mydata.csv}
"Amplitude","notes: data set 1",
X,Y,
1,1,
2,2,
3,3,
4,4,

"Amplitude","notes: data set 2",
X,Y,
1,7,
2,6,
3,5,
4,4,
5,3,
6,2,
7,1,

"CH1","notes: data set 1",
"CH1","notes: data set 2",
\end{filecontents*}
\newcommand\sublist[1]{SUBLIST #1:\\\foreachitem\x\in\myarray[#1]{\xcnt) \x\\}\par}
\begin{document}
\simplereaddef[\\]{mydata.csv}\mydata% OPTIONAL ARG IS record-delim (DEFAULT ,)
\setsepchar{\\\\/\\}% OF FORM {2X record-delim / record-delim}
\ignoreemptyitems
\readlist\myarray{\mydata}
Number of non-empty sublists: \listlen\myarray[]

\sublist{1}

\sublist{3}

\sublist{2}
\end{document}

Question 3

这是不完整的思想集合。它仅报告确实pgfplots带有处理空行的方法。并且可以在选项中注入某些内容scanline（empty line=scanline，请参阅手册第 45 页）。我在这里建议的主要内容是通过添加来使用此信息

  \xdef\BlockLength{\pgfplots@scanlinelength}%

以便\pgfplotsscanlinelength@scanline@complete跟踪块长度。下面要做的是遍历文件，找出块的长度（包括标题）并记录下来。这既不优雅，也没有经过充分测试，更不用说完整的答案了，但似乎通过了一些非常基本的检查。

\documentclass[a4paper,12pt]{article}
\usepackage{pgfplots,filecontents}

\begin{filecontents*}{data.csv}
"Amplitude","notes: data set 1",
X,Y,
1,1,
2,2,
3,3,
4,4,

"Amplitude","notes: data set 2",
X,Y,
1,7,
2,6,
3,5,
4,4,
5,3,
6,2,
7,1,

"Amplitude","notes: data set 3",
X,Y,
3,5,
4,4,
5,3,
6,2,
7,1,

"Amplitude","notes: data set 4",
X,Y,
3,5,
4,4,
5,3,
6,2,
6,2,
7,1,

"CH1","notes: data set 1",
"CH1","notes: data set 2",
\end{filecontents*}
\makeatletter
\def\pgfplotsscanlinelength@scanline@complete{%
        \ifnum\pgfplots@scanlinelength>0
                \ifnum\c@pgfplots@scanlineindex=0
                        %
                        % \pgfplotsscanlinecomplete
                        % \pgfplotsscanlinecomplete
                        % \pgfplotsscanlinecomplete
                        % should have the same effect as a single statement. Do
                        % nothing here.
                \else
                        \ifnum\pgfplots@scanlinelength=\c@pgfplots@scanlineindex\relax
                        \else
%\message{Found inconsistent scan line length: \pgfplots@scanlinelength\space vs. \the\c@pgfplots@scanlineindex\space near line \pgfplotstablelineno.}%
                                % special marker which means 'inconsistent scan line length found'
                                \def\pgfplots@scanlinelength{-2}%
                        \fi
                        \pgfplotsplothandlernotifyscanlinecomplete
                \fi
        \else
                \ifnum\pgfplots@scanlinelength=-2
                \else
                        \edef\pgfplots@scanlinelength{\the\c@pgfplots@scanlineindex}%
                        \xdef\BlockLength{\pgfplots@scanlinelength}%
                \fi
                %
                \ifnum\c@pgfplots@scanlineindex>0
                        \pgfplotsplothandlernotifyscanlinecomplete
                \fi
        \fi
        \c@pgfplots@scanlineindex=0
        \pgfplotsutil@advancestringcounter\pgfplotsscanlineindex%
}
\makeatother
\newsavebox{\NonSense}
\begin{document}
\begin{lrbox}{\NonSense}
\begin{tikzpicture}
\pgfplotstableread[comment chars={"}]{data.csv}\loadedtable
\pgfplotstablegetrowsof{\loadedtable}
\pgfmathtruncatemacro{\DataLines}{\pgfplotsretval+1}
\typeout{\DataLines}
\def\SkipLength{0}
\foreach \X in {0,...,12}
{\begin{axis} 
    \addplot[empty line=scanline] table[x expr=0,y expr=0,skip first n=\SkipLength] {data.csv};
\end{axis}
\ifnum\X=0
\xdef\LstBlocks{\BlockLength}
\pgfmathtruncatemacro{\SkipLength}{\BlockLength+2}
\xdef\SkipLength{\SkipLength}
\else
\xdef\LstBlocks{\LstBlocks,\BlockLength}
\pgfmathtruncatemacro{\SkipLength}{\SkipLength+\BlockLength+2}
\xdef\SkipLength{\SkipLength}
\fi
\ifnum\SkipLength>\DataLines
\breakforeach
\fi
}
\end{tikzpicture}
\end{lrbox}
blocks:\LstBlocks
\end{document}

因此，这篇文章的重点是报告这个empty line事情，希望一些专家可能会发现它对真实和完整的解决方案有用。

Answer

这是不完整的思想集合。它仅报告确实pgfplots带有处理空行的方法。并且可以在选项中注入某些内容scanline（empty line=scanline，请参阅手册第 45 页）。我在这里建议的主要内容是通过添加来使用此信息

  \xdef\BlockLength{\pgfplots@scanlinelength}%

以便\pgfplotsscanlinelength@scanline@complete跟踪块长度。下面要做的是遍历文件，找出块的长度（包括标题）并记录下来。这既不优雅，也没有经过充分测试，更不用说完整的答案了，但似乎通过了一些非常基本的检查。

\documentclass[a4paper,12pt]{article}
\usepackage{pgfplots,filecontents}

\begin{filecontents*}{data.csv}
"Amplitude","notes: data set 1",
X,Y,
1,1,
2,2,
3,3,
4,4,

"Amplitude","notes: data set 2",
X,Y,
1,7,
2,6,
3,5,
4,4,
5,3,
6,2,
7,1,

"Amplitude","notes: data set 3",
X,Y,
3,5,
4,4,
5,3,
6,2,
7,1,

"Amplitude","notes: data set 4",
X,Y,
3,5,
4,4,
5,3,
6,2,
6,2,
7,1,

"CH1","notes: data set 1",
"CH1","notes: data set 2",
\end{filecontents*}
\makeatletter
\def\pgfplotsscanlinelength@scanline@complete{%
        \ifnum\pgfplots@scanlinelength>0
                \ifnum\c@pgfplots@scanlineindex=0
                        %
                        % \pgfplotsscanlinecomplete
                        % \pgfplotsscanlinecomplete
                        % \pgfplotsscanlinecomplete
                        % should have the same effect as a single statement. Do
                        % nothing here.
                \else
                        \ifnum\pgfplots@scanlinelength=\c@pgfplots@scanlineindex\relax
                        \else
%\message{Found inconsistent scan line length: \pgfplots@scanlinelength\space vs. \the\c@pgfplots@scanlineindex\space near line \pgfplotstablelineno.}%
                                % special marker which means 'inconsistent scan line length found'
                                \def\pgfplots@scanlinelength{-2}%
                        \fi
                        \pgfplotsplothandlernotifyscanlinecomplete
                \fi
        \else
                \ifnum\pgfplots@scanlinelength=-2
                \else
                        \edef\pgfplots@scanlinelength{\the\c@pgfplots@scanlineindex}%
                        \xdef\BlockLength{\pgfplots@scanlinelength}%
                \fi
                %
                \ifnum\c@pgfplots@scanlineindex>0
                        \pgfplotsplothandlernotifyscanlinecomplete
                \fi
        \fi
        \c@pgfplots@scanlineindex=0
        \pgfplotsutil@advancestringcounter\pgfplotsscanlineindex%
}
\makeatother
\newsavebox{\NonSense}
\begin{document}
\begin{lrbox}{\NonSense}
\begin{tikzpicture}
\pgfplotstableread[comment chars={"}]{data.csv}\loadedtable
\pgfplotstablegetrowsof{\loadedtable}
\pgfmathtruncatemacro{\DataLines}{\pgfplotsretval+1}
\typeout{\DataLines}
\def\SkipLength{0}
\foreach \X in {0,...,12}
{\begin{axis} 
    \addplot[empty line=scanline] table[x expr=0,y expr=0,skip first n=\SkipLength] {data.csv};
\end{axis}
\ifnum\X=0
\xdef\LstBlocks{\BlockLength}
\pgfmathtruncatemacro{\SkipLength}{\BlockLength+2}
\xdef\SkipLength{\SkipLength}
\else
\xdef\LstBlocks{\LstBlocks,\BlockLength}
\pgfmathtruncatemacro{\SkipLength}{\SkipLength+\BlockLength+2}
\xdef\SkipLength{\SkipLength}
\fi
\ifnum\SkipLength>\DataLines
\breakforeach
\fi
}
\end{tikzpicture}
\end{lrbox}
blocks:\LstBlocks
\end{document}

因此，这篇文章的重点是报告这个empty line事情，希望一些专家可能会发现它对真实和完整的解决方案有用。

查找 .csv 文件中空白行的位置

答案1

答案2

答案3

相关内容