根据另一个表中的值对表进行排序

根据另一个表中的值对表进行排序

我有两个表(A 和 B),它们都有一个字符串值列和一个整数值列。我实际上想要的是按表 A 的值排序的已排序表 B。关键点是行由字符串值而不是索引标识,并且 B 不包含 A 拥有的所有行(否则我可以先根据字符串列进行排序)。

\documentclass{article}

\usepackage{pgfplots}
\usepackage{pgfplotstable}

    \def\histocomp#1 sort on #2{

    \pgfplotstableread{#2}\sortOnTable

    \begin{tikzpicture}
    \begin{axis}
    \pgfplotstablesort[sort key=index,sort key from=\sortOnTable]\loadedtable\sortOnTable
    \addplot+ [no markers] table[x expr=\coordindex, y index=1] \loadedtable;


    \pgfplotstableread{#1}\loadedtable 
    \pgfplotstablesort[sort key=index,sort key from=\sortOnTable]\loadedtable\loadedtable
    \addplot+ [only marks] table[x expr=\coordindex, y index=1] \loadedtable;
    \end{axis}
    \end{tikzpicture}

    }

\begin{document}
\histocomp{B} sort on {A}
\end{document}

A:

index values
foo 0.5
bar 0.2
baz 0.7

乙:

index values
bar 0.8
baz 0.5

是否存在外连接之类的东西?

答案1

如果我理解正确的话,您希望对表进行排序B,使其根据valuesA 中与和index连接的相应列进行排序。在ABdatatool您可以提供一个\dtlsort用于比较的处理程序。在下面的示例中,我定义了一个处理程序宏,用于查找 中的相应值A。这假设以下 CSV 文件:

数据1.csv

index,values
foo,0.5
bar,0.2
baz,0.7
wibble,0.1

数据2.csv

index,values
bar,0.8
wibble,0.6
baz,0.5

平均能量损失

\documentclass{article}

\usepackage{etoolbox}
\usepackage[verbose]{datatool}

\DTLloaddb{A}{data1.csv}
\DTLloaddb{B}{data2.csv}

% Define a comparison handler

\newcommand{\joincompare}[3]{%
 \ifstrequal{#2}{#3}%
 {%
   #1=0\relax
 }%
 {%
   % Fetch the row in A that corresponds to index "#2"
   \dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{#2}%
   % Fetch the "values" field in that row
   \dtlgetentryfromcurrentrow{\valueI}{\dtlcolumnindex{A}{values}}%
   % Fetch the row in A that corresponds to index "#3"
   \dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{#3}%
   % Fetch the "values" field in that row
   \dtlgetentryfromcurrentrow{\valueII}{\dtlcolumnindex{A}{values}}%
   % Compare the values
   % (This assumes values are plain numbers with a full stop as the
   % decimal point and no number group separators. If this isn't the
   % case you'll need to use \DTLifnumlt instead of \dtlifnumlt.)
   \dtlifnumlt{\valueI}{\valueII}{#1=-1\relax}{#1=1\relax}%
 }%
}

\begin{document}

Unsorted table~B data:
\begin{center}
\DTLdisplaydb{B}
\end{center}

Unsorted table~A data:
\begin{center}
\DTLdisplaydb{A}
\end{center}

\dtlsort{index}{B}{\joincompare}

Sorted table~B data:
\begin{center}    
\DTLdisplaydb{B}
\end{center}

Merged tables:
\begin{center}
\begin{tabular}{lll}%
\bfseries Index & \bfseries Values (B) & \bfseries Values (A)
\DTLforeach*{B}{\Index=index,\Values=values}{%
  \\%
  \Index & \Values & 
   \edef\dogetrow{\noexpand\dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{\Index}}%
   \dogetrow
   \dtlgetentryfromcurrentrow{\valuesA}{\dtlcolumnindex{A}{values}}%
   \valuesA
}%
\end{tabular}%
\end{center}
\end{document}

得出的结果为:

数据图像

然而我认为最好将数据存储在 SQL 表中,然后使用 SQL 语句提取所需数据。您可以编写一个脚本来执行此操作,并将相应的结果写入 TeX 可以读取的文件中,或者您可以使用datatooltk将信息(使用 SQL SELECT 语句)拉入datatool数据库文件。如果您有大量数据,情况尤其如此,因为 TeX 并非设计为结构化查询语言。

更新:

下面的示例是上述示例的修改,它还显示数据,首先在散点图中,然后在条形图中。为此,更容易将相关值从数据库附加A到数据库B

\documentclass{article}

\usepackage{etoolbox}
\usepackage{datatool}
\usepackage{dataplot,databar}

\DTLloaddb{A}{data1.csv}
\DTLloaddb{B}{data2.csv}

% Define a comparison handler

\newcommand{\joincompare}[3]{%
 \ifstrequal{#2}{#3}%
 {%
   #1=0\relax
 }%
 {%
   % Fetch the row in A that corresponds to index "#2"
   \dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{#2}%
   % Fetch the "values" field in that row
   \dtlgetentryfromcurrentrow{\valueI}{\dtlcolumnindex{A}{values}}%
   % Fetch the row in A that corresponds to index "#3"
   \dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{#3}%
   % Fetch the "values" field in that row
   \dtlgetentryfromcurrentrow{\valueII}{\dtlcolumnindex{A}{values}}%
   % Compare the values
   % (This assumes values are plain numbers with a full stop as the
   % decimal point and no number group separators. If this isn't the
   % case you'll need to use \DTLifnumlt instead of \dtlifnumlt.)
   \dtlifnumlt{\valueI}{\valueII}{#1=-1\relax}{#1=1\relax}%
 }%
}

\begin{document}

Unsorted table~B data:
\begin{center}
\DTLdisplaydb{B}
\end{center}

Unsorted table~A data:
\begin{center}
\DTLdisplaydb{A}
\end{center}

\dtlsort{index}{B}{\joincompare}

Sorted table~B data:
\begin{center}    
\DTLdisplaydb{B}
\end{center}

Append ``values'' column from A to B (only for those rows that
are in B). Call the new column ``valuesA'':
\dtlforcolumn{\Index}{B}{index}{%
   \edef\dogetrow{\noexpand\dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{\Index}}%
   \dogetrow
   \dtlgetentryfromcurrentrow{\valuesA}{\dtlcolumnindex{A}{values}}%
   \edef\dogetrow{\noexpand\dtlgetrowforvalue{B}{\dtlcolumnindex{A}{index}}{\Index}}%
   \dogetrow
   \dtlappendentrytocurrentrow{valuesA}{\valuesA}%
   \dtlrecombine
}%

Updated table B:
\begin{center}
\DTLdisplaydb{B}
\end{center}

Plot of ``values'' column against ``valuesA'' column shown in
figure~\ref{fig:plot}. A bar chart is shown in
figure~\ref{fig:bar}.

\begin{figure}[htbp]
\centering
% round values on x tics to 1 d.p.
\setcounter{DTLplotroundXvar}{1}%
% similarly for the y axis
\setcounter{DTLplotroundYvar}{1}%
\DTLplot
{B}% database name
{%
   x=values,% column to use for x coords
   y=valuesA,% column to use for y coords
   xlabel={Values in column ``values''},% x axis label
   ylabel={Values in column ``valuesA''},% y axis label
   style=markers,% scatter plot (rather than line plot)
   marks={\pgfuseplotmark{x}},% plot markers
   ticdir=out,% direction of tics
   xticgap=0.1,% increment between tic marks on x axis
   yticgap=0.1,% increment between tic marks on y axis
   width=3in,% width of x-axis
   height=3in,% height of y-axis
}%
\caption{Plot of Values in Database ``B''}
\label{fig:plot}
\end{figure}

\begin{figure}[htbp]
\centering
% set the width of each bar:
\setlength{\DTLbarwidth}{5mm}
% set the colour of the bars:
\DTLsetbarcolor{1}{blue}%
\DTLsetbarcolor{2}{red}%
% Display the bar chart:
\DTLmultibarchart
{% bar chart settings
   variables={\ValuesB,\ValuesA},% plot variables
   length=2in,% length of Y axis
   axes=both,% show both axes
   uppermultibarlabels={B,A},% upper bar labels
   barlabel={\Index},% bar group labels
   verticalbars=true,% vertical orientation
}%
{B}% database name
{\ValuesB=values,\ValuesA=valuesA,\Index=index}% assignment list
\caption{Bar Chart of Values in Database ``B''}
\label{fig:bar}
\end{figure}

\end{document}

这产生了以下情节:

图1

和条形图:

图 2

警告:TeX 是一种排版语言,而不是结构化查询语言或数学语言,因此,将所有这一切都用 TeX 来完成比将 TeX 与 SQL 和 R/Matlab/Octave 结合起来要慢。

请注意,使用最新版本datatool(v2.17),您可以替换

\edef\dogetrow{\noexpand\dtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{\Index}}%
\dogetrow

\edtlgetrowforvalue{A}{\dtlcolumnindex{A}{index}}{\Index}%

(对于 也类似B

相关内容