我有一个数据文件,在将其发送到 gnuplot 之前正在对其进行预处理。它由标题行和多行数据组成,并具有许多数据类型列。前三列始终具有相同的类型和顺序。文件内的总列数是恒定的,但文件之间的总列数不是恒定的。 seqno 不保证从 1 开始,但始终单调递增。
我有一个脚本可以在数据文件中我想要的位置插入标题行,但我希望能够根据我当前所在的匹配来更改该标题。具体来说,我想将 $i 变量添加到第四个变量, 5th, ..endth 列是我的标题。除此之外,每个位置的标题都是相同的。
这将作为脚本运行,因此如果我需要预处理标题以找出它有多少列,那么可以轻松完成。
我当前的脚本,没有所需的标头替换是(末尾有其他标头示例):
header=$(head -n1 $input)
awk -v i=3 -v hdr="$header" 'NR>1 && $i!=p {print "\n\n"hdr}{p=$i} 1' ${input} > ${output}
我的输入示例是:
#filename seqno phasename a b c scale Rwp
blah_001.xye 1 corundum 3 3 12 0.001 3
blah_002.xye 2 corundum 3.1 3.1 12.1 0.002 3.5
blah_003.xye 3 corundum 3.2 3.2 12.2 0.001 3.1
blah_001.xye 2 silcon_NIST 5.4 5.4 5.4 0.002 3
blah_002.xye 3 silcon_NIST 5.41 5.41 5.41 0.004 3.5
blah_003.xye 4 silcon_NIST 5.42 5.42 5.42 0.002 3.1
我当前的输出是:
#filename seqno phasename a b c scale Rwp
blah_001.xye 1 corundum 3 3 12 0.001 3
blah_002.xye 2 corundum 3.1 3.1 12.1 0.002 3.5
blah_003.xye 3 corundum 3.2 3.2 12.2 0.001 3.1
#filename seqno phasename a b c scale Rwp
blah_001.xye 2 silcon_NIST 5.4 5.4 5.4 0.002 3
blah_002.xye 3 silcon_NIST 5.41 5.41 5.41 0.004 3.5
blah_003.xye 4 silcon_NIST 5.42 5.42 5.42 0.002 3.1
我想要的输出是:
#filename seqno phasename corundum_a corundum_b corundum_c corundum_scale corundum_Rwp
blah_001.xye 1 corundum 3 3 12 0.001 3
blah_002.xye 2 corundum 3.1 3.1 12.1 0.002 3.5
blah_003.xye 3 corundum 3.2 3.2 12.2 0.001 3.1
#filename seqno phasename silcon_NIST_a silcon_NIST_b silcon_NIST_c silcon_NIST_scale silcon_NIST_Rwp
blah_001.xye 2 silcon_NIST 5.4 5.4 5.4 0.002 3
blah_002.xye 3 silcon_NIST 5.41 5.41 5.41 0.004 3.5
blah_003.xye 4 silcon_NIST 5.42 5.42 5.42 0.002 3.1
我想要做什么:在将变量插入到输入文件之前,如何更改hdr
awk 中的 awk 变量以将$i
变量添加到变量的第四 - 最后列?hdr
其他文件中用于处理的一些其他示例标头
#filename seqno phasename temp temp_err csL csL_err csG csG_err strL strL_err strG strG_err B_Na B_Na_err B_Mg B_Mg_err B_F B_F_err B_H B_H_err B_O B_O_err B_Fe B_Fe_err F_occ F_occ_err Na_x Na_x_err Na_z Na_z_err F1_x F1_x_err F1_y F1_y_err F1_z F1_z_err F2_x F2_x_err F2_z F2_z_err a1 a1_err a2 a2_err a3 a3_err a4 a4_err a5 a5_err a6 a6_err a7 a7_err s1 s1_err s2 s2_err s3 s3_err a a_err b b_err c c_err al al_err be be_err ga ga_err volume volume_err mass mass_err MAC MAC_err density density_err LAC LAC_err Lvol Lvol_err e0 e0_err scale scale_err wt% wt%_err num_area num_area_err r_bragg r_bragg_err r_wp r_wp_err r_exp r_exp_err gof gof_err
#filename seqno phasename csL csL_err strG strG_err a a_err b b_err c c_err al al_err be be_err ga ga_err volume volume_err mass mass_err MAC MAC_err density density_err LAC LAC_err Lvol Lvol_err e0 e0_err scale scale_err wt% wt%_err num_area num_area_err r_bragg r_bragg_err r_wp r_wp_err r_exp r_exp_err gof gof_err
#filename seqno phasename csG strL F1_x F1_y F1_z volume gof
答案1
$ awk -f script.awk file
#filename seqno phasename corundum_a corundum_b corundum_c corundum_scale corundum_Rwp
blah_001.xye 1 corundum 3 3 12 0.001 3
blah_002.xye 2 corundum 3.1 3.1 12.1 0.002 3.5
blah_003.xye 3 corundum 3.2 3.2 12.2 0.001 3.1
#filename seqno phasename silcon_NIST_a silcon_NIST_b silcon_NIST_c silcon_NIST_scale silcon_NIST_Rwp
blah_001.xye 2 silcon_NIST 5.4 5.4 5.4 0.002 3
blah_002.xye 3 silcon_NIST 5.41 5.41 5.41 0.004 3.5
blah_003.xye 4 silcon_NIST 5.42 5.42 5.42 0.002 3.1
哪里script.awk
BEGIN { OFS = "\t" }
/^#/ {
# save header fields
for (i = 1; i <= NF; ++i)
header[i] = $i
next
}
# if column 2 contains a lower number than the previous line
# (or if no previous line with data), then output header
$2 < col2 || !col2 {
# output blank line if needed
if (print_blank) {
print ""
}
print_blank = 1
# print first three headers as-is
for (i = 1; i <= 3; ++i)
printf("%s%s", header[i], OFS)
# prepend column three to remaining headers
for (i = 4; i < NF; ++i)
printf("%s_%s%s", $3, header[i], OFS)
printf("%s_%s%s", $3, header[NF], ORS)
}
# print all lines and save value from column 2
{ col2 = $2; print }
该脚本将输入数据的标头保存在数组中header
。当我们发现第二列的值低于前一行第二列的值时,我们在输出数据之前输出一个新的标题。标题前面有一个空行,除非它是第一个标题。变量列名称取自第三个字段的名称。
该脚本不带任何参数。
答案2
如果存在以下可能性:阶段名组以比最后一个阶段大的 seqno 开头,则可能无法依赖 seqno,但可能更好地依赖于 Phasename。您可能想尝试一下 Kusalananda 提议的改编:
awk '
FNR == 1 {split ($0, header)
next
}
$3 != LAST {printf TMPRS; TMPRS = ORS
for (i = 1; i <= NF; ++i) printf ("%s%s%s", (i>3)?$3"_":_, header[i], (i==NF?ORS:OFS))
}
{LAST = $3
print
}
' OFS="\t" filename1 filename2