我有两个表(A 和 B),它们都有一个字符串值列和一个整数值列。我实际上想要的是按表 A 的值排序的已排序表 B。关键点是行由字符串值而不是索引标识,并且 B 不包含 A 拥有的所有行(否则我可以先根据字符串列进行排序)。
\documentclass{article}
\usepackage{pgfplots}
\usepackage{pgfplotstable}
\def\histocomp#1 sort on #2{
\pgfplotstableread{#2}\sortOnTable
\begin{tikzpicture}
\begin{axis}
\pgfplotstablesort[sort key=index,sort key from=\sortOnTable]\loadedtable\sortOnTable
\addplot+ [no markers] table[x expr=\coordindex, y index=1] \loadedtable;
\pgfplotstableread{#1}\loadedtable
\pgfplotstablesort[sort key=index,sort key from=\sortOnTable]\loadedtable\loadedtable
\addplot+ [only marks] table[x expr=\coordindex, y index=1] \loadedtable;
\end{axis}
\end{tikzpicture}
}
\begin{document}
\histocomp{B} sort on {A}
\end{document}
A:
index values
foo 0.5
bar 0.2
baz 0.7
乙:
index values
bar 0.8
baz 0.5
是否存在外连接之类的东西?
答案1
如果我理解正确的话,您希望对表进行排序B
,使其根据values
A 中与和index
连接的相应列进行排序。在A
B
datatool
您可以提供一个\dtlsort
用于比较的处理程序。在下面的示例中,我定义了一个处理程序宏,用于查找 中的相应值A
。这假设以下 CSV 文件:
数据1.csv:
index,values
foo,0.5
bar,0.2
baz,0.7
wibble,0.1
数据2.csv:
index,values
bar,0.8
wibble,0.6
baz,0.5
平均能量损失:
\documentclass{article}
\usepackage{etoolbox}
\usepackage[verbose]{datatool}
\DTLloaddb{A}{data1.csv}
\DTLloaddb{B}{data2.csv}
% Define a comparison handler
\newcommand{\joincompare}[3]{%
\ifstrequal{#2}{#3}%
{%
#1=0\relax
}%
{%
% Fetch the row in A that corresponds to index "#2"
\dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{#2}%
% Fetch the "values" field in that row
\dtlgetentryfromcurrentrow{\valueI}{\dtlcolumnindex{A}{values}}%
% Fetch the row in A that corresponds to index "#3"
\dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{#3}%
% Fetch the "values" field in that row
\dtlgetentryfromcurrentrow{\valueII}{\dtlcolumnindex{A}{values}}%
% Compare the values
% (This assumes values are plain numbers with a full stop as the
% decimal point and no number group separators. If this isn't the
% case you'll need to use \DTLifnumlt instead of \dtlifnumlt.)
\dtlifnumlt{\valueI}{\valueII}{#1=-1\relax}{#1=1\relax}%
}%
}
\begin{document}
Unsorted table~B data:
\begin{center}
\DTLdisplaydb{B}
\end{center}
Unsorted table~A data:
\begin{center}
\DTLdisplaydb{A}
\end{center}
\dtlsort{index}{B}{\joincompare}
Sorted table~B data:
\begin{center}
\DTLdisplaydb{B}
\end{center}
Merged tables:
\begin{center}
\begin{tabular}{lll}%
\bfseries Index & \bfseries Values (B) & \bfseries Values (A)
\DTLforeach*{B}{\Index=index,\Values=values}{%
\\%
\Index & \Values &
\edef\dogetrow{\noexpand\dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{\Index}}%
\dogetrow
\dtlgetentryfromcurrentrow{\valuesA}{\dtlcolumnindex{A}{values}}%
\valuesA
}%
\end{tabular}%
\end{center}
\end{document}
得出的结果为:
然而我认为最好将数据存储在 SQL 表中,然后使用 SQL 语句提取所需数据。您可以编写一个脚本来执行此操作,并将相应的结果写入 TeX 可以读取的文件中,或者您可以使用datatooltk
将信息(使用 SQL SELECT 语句)拉入datatool
数据库文件。如果您有大量数据,情况尤其如此,因为 TeX 并非设计为结构化查询语言。
更新:
下面的示例是上述示例的修改,它还显示数据,首先在散点图中,然后在条形图中。为此,更容易将相关值从数据库附加A
到数据库B
:
\documentclass{article}
\usepackage{etoolbox}
\usepackage{datatool}
\usepackage{dataplot,databar}
\DTLloaddb{A}{data1.csv}
\DTLloaddb{B}{data2.csv}
% Define a comparison handler
\newcommand{\joincompare}[3]{%
\ifstrequal{#2}{#3}%
{%
#1=0\relax
}%
{%
% Fetch the row in A that corresponds to index "#2"
\dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{#2}%
% Fetch the "values" field in that row
\dtlgetentryfromcurrentrow{\valueI}{\dtlcolumnindex{A}{values}}%
% Fetch the row in A that corresponds to index "#3"
\dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{#3}%
% Fetch the "values" field in that row
\dtlgetentryfromcurrentrow{\valueII}{\dtlcolumnindex{A}{values}}%
% Compare the values
% (This assumes values are plain numbers with a full stop as the
% decimal point and no number group separators. If this isn't the
% case you'll need to use \DTLifnumlt instead of \dtlifnumlt.)
\dtlifnumlt{\valueI}{\valueII}{#1=-1\relax}{#1=1\relax}%
}%
}
\begin{document}
Unsorted table~B data:
\begin{center}
\DTLdisplaydb{B}
\end{center}
Unsorted table~A data:
\begin{center}
\DTLdisplaydb{A}
\end{center}
\dtlsort{index}{B}{\joincompare}
Sorted table~B data:
\begin{center}
\DTLdisplaydb{B}
\end{center}
Append ``values'' column from A to B (only for those rows that
are in B). Call the new column ``valuesA'':
\dtlforcolumn{\Index}{B}{index}{%
\edef\dogetrow{\noexpand\dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{\Index}}%
\dogetrow
\dtlgetentryfromcurrentrow{\valuesA}{\dtlcolumnindex{A}{values}}%
\edef\dogetrow{\noexpand\dtlgetrowforvalue{B}{\dtlcolumnindex{A}{index}}{\Index}}%
\dogetrow
\dtlappendentrytocurrentrow{valuesA}{\valuesA}%
\dtlrecombine
}%
Updated table B:
\begin{center}
\DTLdisplaydb{B}
\end{center}
Plot of ``values'' column against ``valuesA'' column shown in
figure~\ref{fig:plot}. A bar chart is shown in
figure~\ref{fig:bar}.
\begin{figure}[htbp]
\centering
% round values on x tics to 1 d.p.
\setcounter{DTLplotroundXvar}{1}%
% similarly for the y axis
\setcounter{DTLplotroundYvar}{1}%
\DTLplot
{B}% database name
{%
x=values,% column to use for x coords
y=valuesA,% column to use for y coords
xlabel={Values in column ``values''},% x axis label
ylabel={Values in column ``valuesA''},% y axis label
style=markers,% scatter plot (rather than line plot)
marks={\pgfuseplotmark{x}},% plot markers
ticdir=out,% direction of tics
xticgap=0.1,% increment between tic marks on x axis
yticgap=0.1,% increment between tic marks on y axis
width=3in,% width of x-axis
height=3in,% height of y-axis
}%
\caption{Plot of Values in Database ``B''}
\label{fig:plot}
\end{figure}
\begin{figure}[htbp]
\centering
% set the width of each bar:
\setlength{\DTLbarwidth}{5mm}
% set the colour of the bars:
\DTLsetbarcolor{1}{blue}%
\DTLsetbarcolor{2}{red}%
% Display the bar chart:
\DTLmultibarchart
{% bar chart settings
variables={\ValuesB,\ValuesA},% plot variables
length=2in,% length of Y axis
axes=both,% show both axes
uppermultibarlabels={B,A},% upper bar labels
barlabel={\Index},% bar group labels
verticalbars=true,% vertical orientation
}%
{B}% database name
{\ValuesB=values,\ValuesA=valuesA,\Index=index}% assignment list
\caption{Bar Chart of Values in Database ``B''}
\label{fig:bar}
\end{figure}
\end{document}
这产生了以下情节:
和条形图:
警告:TeX 是一种排版语言,而不是结构化查询语言或数学语言,因此,将所有这一切都用 TeX 来完成比将 TeX 与 SQL 和 R/Matlab/Octave 结合起来要慢。
请注意,使用最新版本datatool
(v2.17),您可以替换
\edef\dogetrow{\noexpand\dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{\Index}}%
\dogetrow
和
\edtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{\Index}%
(对于 也类似B
)