如何解析文件以提取“组号”中保存的 3 位数字

如何解析文件以提取“组号”中保存的 3 位数字

我正在编写一个 shell 脚本来解析从标准化 pdf 文件中提取的文本文件。我想为每个测试组(由组 0、组 1... 标识)获取测试编号列表,例如组 0 的 101、102、412...。我尝试过 sed、awk,但我确实这样做理想情况下,我希望将输出转换为 LaTeX 代码,即每个输出项都由合适的字符串包围,例如

\section{Group0}
\Testdetails{101}
\Testdetails{102}
...............
\section{Group1}
\Testdetails{305}
................

这是源文件。

                                                Table 6

                       Tests                     EN 2591-                   Remarks

                                                            All models
 Group 0
 Visual examination                                101
 Examination of dimensions and mass                102      To be performed on one pair per layout, in
                                                            sealed and un-sealed versions
 Contact insertion and extraction forces           412      To be performed on one pair per layout, in
                                                            sealed and un-sealed versions
 Measurement of insulation resistance              206      Only specimens of group 6
 Voltage proof test                                207      Only specimens of group 6
 Contact resistance - Low level                    201
 Contact resistance at rated current               202
 Mating and unmating forces                        408      On specimens of groups 2, 4 and 6
 Visual examination                                101
 Group 1
 Rapid change of temperature                       305
 Visual examination                                101
 Interfacial sealing                               324
 Measurement of insulation resistance              206      Immersed connectors
 Voltage proof test                                207      Immersed connectors
 Insert retention in housing (axial)               410
 Contact retention in insert                       409
 Mechanical strength of rear accessories           420
 Contact retention system effectiveness            426
 (removable contact walkout)
 Visual examination                                101
 Group 2
 Contact retention in insert                       409
 Rapid change of temperature                       305

答案1

awk '
    $1 == "Group" {printf("\\section{%s%d}\n", $1, $2); next}
    {for (i=1; i<=NF; i++) 
        if ($i ~ /^[0-9][0-9][0-9]$/) {
            printf("\\Testdetails{%d}\n", $i)
            break
        }
    }
' 

根据评论更新:

awk '
    $1 == "Group" {printf("\\section{%s %d}\n", $1, $2); next}
    {
      title = sep = ""
      for (i=1; i<=NF; i++) 
        if ($i ~ /^[0-9][0-9][0-9]$/) {
          printf("\\subsection{%s} \\Testdetails{%d}\n", title, $i)
          break
        }
        else {
          title = title sep $i
          sep = FS
        }
    }
' 

答案2

perl使用regexp和假设的一种方法infile是您在问题中发布的内容。

内容script.pl

use warnings;
use strict;

while ( <> ) { 
    chomp;
    if ( m/\A\s*(Group)\s*(\d+)/ ) { 
        printf qq[\\Section{%s}\n], $1 . $2; 
        next;
    }   

    if ( m/\s(\d{3})(?:\s|$)/ ) { 
        printf qq[\\Testdetails{%s}\n], $1; 
    }   
}

像这样运行它:

perl script.pl infile

具有以下输出:

\Section{Group0}                                      
\Testdetails{101}                                      
\Testdetails{102}                                      
\Testdetails{412}                                      
\Testdetails{206}                                      
\Testdetails{207}                                      
\Testdetails{201}                                      
\Testdetails{202}                                     
\Testdetails{408}                                      
\Testdetails{101}                                      
\Section{Group1}                                      
\Testdetails{305}                                     
\Testdetails{101}                                     
\Testdetails{324}                                     
\Testdetails{206}                                      
\Testdetails{207}                                        
\Testdetails{410}
\Testdetails{409}
\Testdetails{420}
\Testdetails{426}
\Testdetails{101}
\Section{Group2}
\Testdetails{409}
\Testdetails{305}

答案3

为了完整起见,这里有一个sed版本:

sed -n -e 's#^ *Group \([0-9]\+\).*#\\Section{Group\1}#p' \
       -e 's#.*\b\([0-9][0-9][0-9]\)\b.*#\\Testdetails{\1}#p'

相关内容