我正在寻找一种方法来自动计算 pgfplots 中表格数据的标准误差椭圆。从某种意义上说,这是对此的概括问题和这个
我假设有一个包含数据对等的 CSV 文件(x1,y1)
,(x2,y2)
例如,每个数据对都可以是来自二维高斯随机变量的样本。(实际上,它们是两个被认为是相关的量的测量数据)。
根据测量数据(或样本)(xi,yi)
,我想自动确定它们的 1 标准误差椭圆(请参见例如这里)
\documentclass[tikz]{standalone}
\usepackage{pgfplots}
\usepackage{filecontents}
\begin{filecontents*}{samplesFrom2Ddistribution.csv}
x1,y1,y2,x2
2.2696486417166,3.33528373437017,3.95172117455669,2.47586058727834
2.49428705537941,4.51953361456008,5.82446537288901,3.4122326864445
0.516878977484101,0.934410297385335,3.0452169686192,2.0226084843096
0.979735614317035,1.72143506545539,2.90426117955959,1.95213058977979
1.55300498925547,2.42732047804668,6.40266930854992,3.70133465427496
2.1096585913276,3.54549765900551,1.98057657446515,1.49028828723257
3.12873645202828,4.78114693689773,2.99429007971136,1.99714503985568
1.71003695919997,4.97895855494968,4.83973415961279,2.9198670798064
3.26155071814115,3.88196452335622,3.29961746526552,2.14980873263276
2.47542481170727,2.10726710573745,5.80986689135395,3.40493344567698
\end{filecontents*}
\begin{document}
\begin{tikzpicture}
\begin{axis}
\addplot table [???] {samplesFrom2Ddistribution.csv};
\end{axis}
\end{tikzpicture}
\end{document}
我正在寻找一种方法让 pgfplots 从上面的表格数据中计算平均值和标准误差椭圆,以获得类似于上面第二个链接中显示的结果:
图像显示了 2(xi,yi)
对(每对在 CSV 中有 10 个数据点),我想要确定每对的标准误差椭圆。
答案1
画这些椭圆很简单,但也很繁琐。我当时正在这篇维基百科文章而不是你的链接。这个答案带有一种风格error ellipse=for column <col> of <file>
,例如
\draw[blue,fill=blue!20,error ellipse=for column 2 of samplesFrom2Ddistribution.csv];
这就是 MWE。
\documentclass[tikz,border=3.14mm]{standalone}
\usepackage{pgfplots}
\pgfplotsset{compat=1.16}
% smuggling from https://tex.stackexchange.com/a/470979/121799
\newcounter{smuggle}
\DeclareRobustCommand\smuggleone[1]{%
\stepcounter{smuggle}%
\expandafter\global\expandafter\let\csname smuggle@\arabic{smuggle}\endcsname#1%
\aftergroup\let\aftergroup#1\expandafter\aftergroup\csname smuggle@\arabic{smuggle}\endcsname
}
\DeclareRobustCommand\smuggle[2][1]{%
\smuggleone{#2}%
\ifnum#1>1
\aftergroup\smuggle\aftergroup[\expandafter\aftergroup\the\numexpr#1-1\aftergroup]\aftergroup#2%
\fi
}
\usepackage{filecontents}
\begin{filecontents*}{samplesFrom2Ddistribution.csv}
x1,y1,y2,x2
2.2696486417166,3.33528373437017,3.95172117455669,2.47586058727834
2.49428705537941,4.51953361456008,5.82446537288901,3.4122326864445
0.516878977484101,0.934410297385335,3.0452169686192,2.0226084843096
0.979735614317035,1.72143506545539,2.90426117955959,1.95213058977979
1.55300498925547,2.42732047804668,6.40266930854992,3.70133465427496
2.1096585913276,3.54549765900551,1.98057657446515,1.49028828723257
3.12873645202828,4.78114693689773,2.99429007971136,1.99714503985568
1.71003695919997,4.97895855494968,4.83973415961279,2.9198670798064
3.26155071814115,3.88196452335622,3.29961746526552,2.14980873263276
2.47542481170727,2.10726710573745,5.80986689135395,3.40493344567698
\end{filecontents*}
\begin{document}
\tikzset{error ellipse/.style args={for column #1 of #2}{%
/utils/exec={\pgfplotstableread[col sep=comma]{#2}\datatable
\def\mycol{#1}
\pgfplotstablegetrowsof{\datatable}
\pgfmathtruncatemacro{\rownum}{\pgfplotsretval}
\pgfmathsetmacro{\sumx}{0}
\pgfplotstableforeachcolumnelement{x\mycol}\of\datatable\as\cell{
\ifnum\pgfplotstablerow=0
\edef\lstx{\cell}
\else
\edef\lstx{\lstx,\cell}
\fi
\pgfmathsetmacro{\sumx}{\sumx+\cell}
\smuggle{\sumx}
\smuggle{\lstx}
}
\pgfmathsetmacro{\meanx}{\sumx/\rownum}
\pgfmathsetmacro{\sumy}{0}
\pgfplotstableforeachcolumnelement{y\mycol}\of\datatable\as\cell{
\ifnum\pgfplotstablerow=0
\edef\lsty{\cell}
\else
\edef\lsty{\lsty,\cell}
\fi
\pgfmathsetmacro{\sumy}{\sumy+\cell}
\smuggle{\sumy}
}
\pgfmathsetmacro{\meany}{\sumy/\rownum}
\pgfmathsetmacro{\matxx}{0}
\pgfmathsetmacro{\matxy}{0}
\pgfmathsetmacro{\matyy}{0}
\foreach \X [count=\Z starting from 0] in \lstx
{\pgfmathsetmacro{\Y}{{\lsty}[\Z]}
\pgfmathsetmacro{\matxx}{\matxx+(\X-\meanx)*(\X-\meanx)}
\pgfmathsetmacro{\matxy}{\matxy+(\X-\meanx)*(\Y-\meany)}
\pgfmathsetmacro{\matyy}{\matyy+(\Y-\meany)*(\Y-\meany)}
\smuggle[2]{\matxx}
\smuggle[2]{\matxy}
\smuggle[2]{\matyy}
}
\pgfmathsetmacro{\mytheta}{atan2(2*\matxy,(\matxx-\matyy))/2}
\pgfmathsetmacro{\mya}{sqrt((\matxx+\matyy+sqrt((\matxx-\matyy)*(\matxx-\matyy)+4*\matxy*\matxy))/2)}
\pgfmathsetmacro{\myb}{sqrt((\matxx+\matyy-sqrt((\matxx-\matyy)*(\matxx-\matyy)+4*\matxy*\matxy))/2)}
%\typeout{\matxx,\matxy,\matyy,\mya,\myb,\mytheta}
},
insert path={plot[variable=\x,domain=0:360,smooth]
({\meanx+\mya*cos(\mytheta)*cos(\x)-\myb*sin(\mytheta)*sin(\x)},
{\meany+\mya*sin(\mytheta)*cos(\x)+\myb*cos(\mytheta)*sin(\x)})}}}
\begin{tikzpicture}
\begin{axis}[xmin=-3,xmax=7,ymin=-2,ymax=10]
\draw[red,fill=red!20,error ellipse=for column 1 of samplesFrom2Ddistribution.csv];
\draw[blue,fill=blue!20,error ellipse=for column 2 of samplesFrom2Ddistribution.csv];
\addplot[color=red,only marks] table [x=x1,y=y1,col sep=comma] {samplesFrom2Ddistribution.csv};
\addplot[color=blue,only marks] table [x=x2,y=y2,col sep=comma] {samplesFrom2Ddistribution.csv};
\end{axis}
\end{tikzpicture}
\end{document}
不幸的是,我认为您显示的椭圆与您提供的数据没有太大关系。特别是,第二个数据集全部位于一条线上。对上述代码进行一个很好的交叉检查是它确实找到了一个非常窄的椭圆。问题附带的屏幕截图没有显示窄椭圆。(我想知道是否应该有一个因子 2,椭圆应该缩小该因子。在这个版本中,我遵循了我认为是维基百科的惯例。当然,除以这个因子会很简单。)