我正在编写一个 shell 脚本来解析从标准化 pdf 文件中提取的文本文件。我想为每个测试组(由组 0、组 1... 标识)获取测试编号列表,例如组 0 的 101、102、412...。我尝试过 sed、awk,但我确实这样做理想情况下,我希望将输出转换为 LaTeX 代码,即每个输出项都由合适的字符串包围,例如
\section{Group0}
\Testdetails{101}
\Testdetails{102}
...............
\section{Group1}
\Testdetails{305}
................
这是源文件。
Table 6
Tests EN 2591- Remarks
All models
Group 0
Visual examination 101
Examination of dimensions and mass 102 To be performed on one pair per layout, in
sealed and un-sealed versions
Contact insertion and extraction forces 412 To be performed on one pair per layout, in
sealed and un-sealed versions
Measurement of insulation resistance 206 Only specimens of group 6
Voltage proof test 207 Only specimens of group 6
Contact resistance - Low level 201
Contact resistance at rated current 202
Mating and unmating forces 408 On specimens of groups 2, 4 and 6
Visual examination 101
Group 1
Rapid change of temperature 305
Visual examination 101
Interfacial sealing 324
Measurement of insulation resistance 206 Immersed connectors
Voltage proof test 207 Immersed connectors
Insert retention in housing (axial) 410
Contact retention in insert 409
Mechanical strength of rear accessories 420
Contact retention system effectiveness 426
(removable contact walkout)
Visual examination 101
Group 2
Contact retention in insert 409
Rapid change of temperature 305
答案1
awk '
$1 == "Group" {printf("\\section{%s%d}\n", $1, $2); next}
{for (i=1; i<=NF; i++)
if ($i ~ /^[0-9][0-9][0-9]$/) {
printf("\\Testdetails{%d}\n", $i)
break
}
}
'
根据评论更新:
awk '
$1 == "Group" {printf("\\section{%s %d}\n", $1, $2); next}
{
title = sep = ""
for (i=1; i<=NF; i++)
if ($i ~ /^[0-9][0-9][0-9]$/) {
printf("\\subsection{%s} \\Testdetails{%d}\n", title, $i)
break
}
else {
title = title sep $i
sep = FS
}
}
'
答案2
perl
使用regexp
和假设的一种方法infile
是您在问题中发布的内容。
内容script.pl
:
use warnings;
use strict;
while ( <> ) {
chomp;
if ( m/\A\s*(Group)\s*(\d+)/ ) {
printf qq[\\Section{%s}\n], $1 . $2;
next;
}
if ( m/\s(\d{3})(?:\s|$)/ ) {
printf qq[\\Testdetails{%s}\n], $1;
}
}
像这样运行它:
perl script.pl infile
具有以下输出:
\Section{Group0}
\Testdetails{101}
\Testdetails{102}
\Testdetails{412}
\Testdetails{206}
\Testdetails{207}
\Testdetails{201}
\Testdetails{202}
\Testdetails{408}
\Testdetails{101}
\Section{Group1}
\Testdetails{305}
\Testdetails{101}
\Testdetails{324}
\Testdetails{206}
\Testdetails{207}
\Testdetails{410}
\Testdetails{409}
\Testdetails{420}
\Testdetails{426}
\Testdetails{101}
\Section{Group2}
\Testdetails{409}
\Testdetails{305}
答案3
为了完整起见,这里有一个sed
版本:
sed -n -e 's#^ *Group \([0-9]\+\).*#\\Section{Group\1}#p' \
-e 's#.*\b\([0-9][0-9][0-9]\)\b.*#\\Testdetails{\1}#p'