PGFplot-在 \pgfplotstablecreatecol 中查找重复项

PGFplot-在 \pgfplotstablecreatecol 中查找重复项

基于问题,我尝试在生成的哈希列表中查找重复项。这些可能存在,因为我不小心使用了相同的X&Y值,或者因为哈希的前三位数字相等。在这种情况下,我想给出这样的编译器错误\errmessage{Hash \tempHash already used!}

函数内部的一切\calcHash似乎都非常脆弱,我不知道如何在不破坏哈希计算的情况下添加功能。

在以下 MWE 中,我尝试通过datatool数据库查找重复项。但这不是强制性的,当有另一种简单的搜索重复项的可能性时,我完全可以接受 :)

\documentclass[10pt,a4paper]{article}
\usepackage{pgfplotstable}
\usepackage{xstring}
\usepackage{datatool}
\pgfplotsset{compat=newest}

\DTLnewdb{hashDB}
\newcommand{\calcHash}[1]{
    \noexpand\StrLeft{\pdfmdfivesum{#1}}{3}
    \newcommand{\tempHash}{\StrLeft{\pdfmdfivesum{#1}}{3}}
    %\DTLifdbempty{hashDB}
    %{
        %\DTLnewrow{hashDB}
        %\DTLnewdbentry{hashDB}{Hash}{\tempHash}
    %} 
    %\DTLforeach{hashDB}{\hash=hash}
    %{
        %\ifthenelse{\equal{\tempHash}{\hash}}
        %{
        %    \errmessage{Hash \tempHash already used!}
        %}{}
    %}
}

\pgfplotstableread[]{
X Y
1 a
2 b
5 c
}\mydata

\begin{document}

\pgfplotstablecreatecol[
create col/assign/.code={
        \edef\myHash{\noexpand\calcHash{\thisrow{X}\thisrow{Y}}}
        \pgfkeyslet{/pgfplots/table/create col/next content}\myHash
}]{ID}{\mydata}
\pgfplotstablegetrowsof{\mydata}
\pgfmathtruncatemacro\myDataRows{\pgfplotsretval-1}

\pgfplotstabletypeset[string type]{\mydata}
\end{document}

答案1

下面使用 L3 序列和 L3 md5sum 函数来实现\calcHash。请注意 是\calcHash在原处使用,而不是存储在其他宏中,然后将其分配给next content

\documentclass[10pt,a4paper]{article}
\usepackage{pgfplotstable}
\pgfplotsset{compat=newest}

\ExplSyntaxOn
\str_new:N \l__pascals_hash_str
\seq_new:N \g__pascals_hashes_seq
\msg_new:nnn { pascals } { duplicate-hash }
  { Hash~ #1~ already~ used! }
\cs_generate_variant:Nn \str_set:Nn { Ne }
\cs_new_protected:Npn \__pascals_calc_hash:n #1
  {
    \str_set:Ne \l__pascals_hash_str { \str_mdfive_hash:e {#1} }
    \seq_if_in:NVTF \g__pascals_hashes_seq \l__pascals_hash_str
      { \msg_error:nnV { pascals } { duplicate-hash } \l__pascals_hash_str }
      { \seq_gput_right:NV \g__pascals_hashes_seq \l__pascals_hash_str }
    \pgfkeyslet { /pgfplots/table/create~ col/next~ content } \l__pascals_hash_str
  }
\NewDocumentCommand \clearHashes {} { \seq_gclear:N \g__pascals_hashes_seq }
\NewDocumentCommand \calcHash { m } { \__pascals_calc_hash:n {#1} }
\ExplSyntaxOff

\pgfplotstableread[]{
X Y
1 a
2 b
5 c
}\mydata

\begin{document}

\clearHashes
\pgfplotstablecreatecol[
create col/assign/.code={%
  \calcHash{\thisrow{X}\thisrow{Y}}%
}]{ID}{\mydata}
\pgfplotstablegetrowsof{\mydata}
\pgfmathtruncatemacro\myDataRows{\pgfplotsretval-1}

\pgfplotstabletypeset[string type]{\mydata}
\end{document}

仅使用结果哈希中的前三个标记的变体:

\documentclass[10pt,a4paper]{article}
\usepackage{pgfplotstable}
\pgfplotsset{compat=newest}

\ExplSyntaxOn
\str_new:N \l__pascals_hash_str
\seq_new:N \g__pascals_hashes_seq
\msg_new:nnn { pascals } { duplicate-hash }
  { Hash~ #1~ already~ used! }
\cs_generate_variant:Nn \str_set:Nn { Ne }
\cs_generate_variant:Nn \str_range:nnn { e }
\cs_new_protected:Npn \__pascals_calc_hash:n #1
  {
    \str_set:Ne \l__pascals_hash_str
      { \str_range:enn { \str_mdfive_hash:e {#1} } { 1 } { 3 } }
    \seq_if_in:NVTF \g__pascals_hashes_seq \l__pascals_hash_str
      { \msg_error:nnV { pascals } { duplicate-hash } \l__pascals_hash_str }
      { \seq_gput_right:NV \g__pascals_hashes_seq \l__pascals_hash_str }
    \pgfkeyslet { /pgfplots/table/create~ col/next~ content } \l__pascals_hash_str
  }
\NewDocumentCommand \clearHashes {} { \seq_gclear:N \g__pascals_hashes_seq }
\NewDocumentCommand \calcHash { m } { \__pascals_calc_hash:n {#1} }
\ExplSyntaxOff

\pgfplotstableread[]{
X Y
1 a
2 b
5 c
}\mydata

\begin{document}

\clearHashes
\pgfplotstablecreatecol[
create col/assign/.code={%
  \calcHash{\thisrow{X}\thisrow{Y}}%
}]{ID}{\mydata}
\pgfplotstablegetrowsof{\mydata}
\pgfmathtruncatemacro\myDataRows{\pgfplotsretval-1}

\pgfplotstabletypeset[string type]{\mydata}
\end{document}

还有另一种变体,默认使用完整哈希,但带有可选参数,仅使用第一个n人物。

\documentclass[10pt,a4paper]{article}
\usepackage{pgfplotstable}
\pgfplotsset{compat=newest}

\ExplSyntaxOn
\str_new:N \l__pascals_hash_str
\seq_new:N \g__pascals_hashes_seq
\msg_new:nnn { pascals } { duplicate-hash }
  { Hash~ #1~ already~ used! }
\cs_generate_variant:Nn \str_set:Nn { Ne }
\cs_generate_variant:Nn \str_range:nnn { e }
\cs_new_protected:Npn \__pascals_calc_hash:nn #1#2
  {
    \str_set:Ne \l__pascals_hash_str
      { \str_range:enn { \str_mdfive_hash:e {#1} } { 1 } {#2} }
    \seq_if_in:NVTF \g__pascals_hashes_seq \l__pascals_hash_str
      { \msg_error:nnV { pascals } { duplicate-hash } \l__pascals_hash_str }
      { \seq_gput_right:NV \g__pascals_hashes_seq \l__pascals_hash_str }
    \pgfkeyslet { /pgfplots/table/create~ col/next~ content } \l__pascals_hash_str
  }
\NewDocumentCommand \clearHashes {} { \seq_gclear:N \g__pascals_hashes_seq }
\NewDocumentCommand \calcHash { O{-1} m } { \__pascals_calc_hash:nn {#2} {#1} }
\ExplSyntaxOff

\pgfplotstableread[]{
X Y
1 a
2 b
5 c
}\mydata

\begin{document}

\clearHashes
\pgfplotstablecreatecol[
create col/assign/.code={%
  \calcHash[3]{\thisrow{X}\thisrow{Y}}%
}]{ID}{\mydata}
\pgfplotstablegetrowsof{\mydata}
\pgfmathtruncatemacro\myDataRows{\pgfplotsretval-1}

\pgfplotstabletypeset[string type]{\mydata}
\end{document}

相关内容