我有一个制表符分隔的模型输入文件,我想针对集成分析格式进行更改,如下所示
cat input.txt
#############################################
### Parameter file for the program ###
#############################################
### GENERAL PARAMETERS
4 /* nbout # Number of outputs */
46 /* numesp # Number of species */
0.05 /* p # light incidence param (diff through turbid medium) */
0.1357158 0.2446549 0.3535940 0.4992873 0.6449806 0.6957850 0.7465893 0.8130218 0.8794543 0.9397271 1.0000000 0.9397271 0.8794543 0.8078294 0.7362045 0.6899817 0.6437589 0.5989616 0.5541642 0.4617186 0.3692730 0.3633708 0.3574686 0.2426215 /* normalized daily light course (from 7am to 7pm, with a half-hour time-step */
1 /* vox_la_max. The max voxel leaf area. */
0 /* l_growth_scheme. 0 = top down; 1 = random; 2 = homogeneous; 3 = bottom up */
0.1 /* knockout_max. Parameter controlling the extent to which lianas can knock out trees */
0.05 /* shed_prob. With this probability, the liana is completely shed from the voxel. */
### Species description
**** Nmass LMA wsg dmax hmax ah tmax seedmass Fregdistgr Pmass g1 s_liana
Alvaradoa_amorphoides 0.0214 74.775 0.584 0.5 24.44 0.892 1 0.0078 40 0.00145 3.77 0
Annona_reticulata 0.0350 74.529 0.503 0.5 24.44 0.892 1 0.2392 40 0.00142 3.77 0
Brosimum_alicastrum 0.0201 104.281 0.760 0.5 17.31 0.117 1 1.2486 40 0.00097 3.77 0
### Climate (input environment)
25.47447 26.02723 26.87827 27.58436 26.95839 25.63987 25.61669 25.26543 24.99990 24.10808 24.71997 24.67287 /*Temperature in degree C*/
我有另一个制表符分隔的乘数文件,是从格式如下的发行版中选择的
cat multipliers.txt
2 3 4
3 2 2
4 3 3
我试图将 3 个特定输入字段乘以乘数,以生成一系列等于乘数数量(在本例中为 3)的新输入文件,同时保持输入文件的其余部分不变。在这种情况下,我想为第一个文件分别乘以vox_la_max
、knockout_max
和shed_prob
2、3 和 4,为第二个文件乘以 3、2 和 2,为第三个文件乘以 4、3 和 3。我会生成 3 个新文件,如下所示
cat input1.txt
#############################################
### Parameter file for the program ###
#############################################
### GENERAL PARAMETERS
4 /* nbout # Number of outputs */
46 /* numesp # Number of species */
0.05 /* p # light incidence param (diff through turbid medium) */
0.1357158 0.2446549 0.3535940 0.4992873 0.6449806 0.6957850 0.7465893 0.8130218 0.8794543 0.9397271 1.0000000 0.9397271 0.8794543 0.8078294 0.7362045 0.6899817 0.6437589 0.5989616 0.5541642 0.4617186 0.3692730 0.3633708 0.3574686 0.2426215 /* normalized daily light course (from 7am to 7pm, with a half-hour time-step */
2 /* vox_la_max. The max voxel leaf area. */
0 /* l_growth_scheme. 0 = top down; 1 = random; 2 = homogeneous; 3 = bottom up */
0.3 /* knockout_max. Parameter controlling the extent to which lianas can knock out trees */
0.2 /* shed_prob. With this probability, the liana is completely shed from the voxel. */
### Species description
**** Nmass LMA wsg dmax hmax ah tmax seedmass Fregdistgr Pmass g1 s_liana
Alvaradoa_amorphoides 0.0214 74.775 0.584 0.5 24.44 0.892 1 0.0078 40 0.00145 3.77 0
Annona_reticulata 0.0350 74.529 0.503 0.5 24.44 0.892 1 0.2392 40 0.00142 3.77 0
Brosimum_alicastrum 0.0201 104.281 0.760 0.5 17.31 0.117 1 1.2486 40 0.00097 3.77 0
### Climate (input environment)
25.47447 26.02723 26.87827 27.58436 26.95839 25.63987 25.61669 25.26543 24.99990 24.10808 24.71997 24.67287 /*Temperature in degree C*/
cat input2.txt
#############################################
### Parameter file for the program ###
#############################################
### GENERAL PARAMETERS
4 /* nbout # Number of outputs */
46 /* numesp # Number of species */
0.05 /* p # light incidence param (diff through turbid medium) */
0.1357158 0.2446549 0.3535940 0.4992873 0.6449806 0.6957850 0.7465893 0.8130218 0.8794543 0.9397271 1.0000000 0.9397271 0.8794543 0.8078294 0.7362045 0.6899817 0.6437589 0.5989616 0.5541642 0.4617186 0.3692730 0.3633708 0.3574686 0.2426215 /* normalized daily light course (from 7am to 7pm, with a half-hour time-step */
3 /* vox_la_max. The max voxel leaf area. */
0 /* l_growth_scheme. 0 = top down; 1 = random; 2 = homogeneous; 3 = bottom up */
0.2 /* knockout_max. Parameter controlling the extent to which lianas can knock out trees */
0.1 /* shed_prob. With this probability, the liana is completely shed from the voxel. */
### Species description
**** Nmass LMA wsg dmax hmax ah tmax seedmass Fregdistgr Pmass g1 s_liana
Alvaradoa_amorphoides 0.0214 74.775 0.584 0.5 24.44 0.892 1 0.0078 40 0.00145 3.77 0
Annona_reticulata 0.0350 74.529 0.503 0.5 24.44 0.892 1 0.2392 40 0.00142 3.77 0
Brosimum_alicastrum 0.0201 104.281 0.760 0.5 17.31 0.117 1 1.2486 40 0.00097 3.77 0
### Climate (input environment)
25.47447 26.02723 26.87827 27.58436 26.95839 25.63987 25.61669 25.26543 24.99990 24.10808 24.71997 24.67287 /*Temperature in degree C*/
cat input3.txt
#############################################
### Parameter file for the program ###
#############################################
### GENERAL PARAMETERS
4 /* nbout # Number of outputs */
46 /* numesp # Number of species */
0.05 /* p # light incidence param (diff through turbid medium) */
0.1357158 0.2446549 0.3535940 0.4992873 0.6449806 0.6957850 0.7465893 0.8130218 0.8794543 0.9397271 1.0000000 0.9397271 0.8794543 0.8078294 0.7362045 0.6899817 0.6437589 0.5989616 0.5541642 0.4617186 0.3692730 0.3633708 0.3574686 0.2426215 /* normalized daily light course (from 7am to 7pm, with a half-hour time-step */
4 /* vox_la_max. The max voxel leaf area. */
0 /* l_growth_scheme. 0 = top down; 1 = random; 2 = homogeneous; 3 = bottom up */
0.3 /* knockout_max. Parameter controlling the extent to which lianas can knock out trees */
0.15 /* shed_prob. With this probability, the liana is completely shed from the voxel. */
### Species description
**** Nmass LMA wsg dmax hmax ah tmax seedmass Fregdistgr Pmass g1 s_liana
Alvaradoa_amorphoides 0.0214 74.775 0.584 0.5 24.44 0.892 1 0.0078 40 0.00145 3.77 0
Annona_reticulata 0.0350 74.529 0.503 0.5 24.44 0.892 1 0.2392 40 0.00142 3.77 0
Brosimum_alicastrum 0.0201 104.281 0.760 0.5 17.31 0.117 1 1.2486 40 0.00097 3.77 0
### Climate (input environment)
25.47447 26.02723 26.87827 27.58436 26.95839 25.63987 25.61669 25.26543 24.99990 24.10808 24.71997 24.67287 /*Temperature in degree C*/
我认为我应该使用 awk,但到目前为止,一次只能成功地使用一列乘数文件一次改变一个参数,而且我需要能够同时改变这 3 个参数。我可以设置什么样的脚本来生成这些输出?
答案1
TL;博士:awk
为您的示例硬编码的紧凑脚本
NR != FNR {
out = "out" FNR ".txt"
printf "" > out
for (l=m=1; l <= nl; l++)
printf tmpl[l] ORS, l in vals ? $(m++)*vals[l] : 0 >> out
close(out)
next
}
{
gsub(/%/, "%%")
# here is the regex that selects the fields by their name
if ($3 ~ /^(vox_la_max|knockout_max|shed_prob)[^[:alnum:]_]*$/) {
vals[NR] = $1
sub(/^[0-9]+(\.[0-9]+)?/, OFMT)
}
tmpl[NR] = $0; nl++
}
将其用作:
LC_NUMERIC=C awk -f script input.txt multipliers.txt
它生成名为 的输出文件outX.txt
。
LC_NUMERIC=C
如果您的区域设置使用逗号而不是点作为浮点值的小数点分隔符,则需要该位。
为了简单起见,我做了一些看起来合理的假设:
- 所需的输入字段始终是单独的值,并带有相邻注释,指示字段名称为一个单词,必须用空格(至少一个空格)将其与
/*
- 没有同名的字段
- 浮点值仅用数字和(可能)一个点表示,即没有指数或其他科学表示
与上面相同的脚本,但详细、描述和扩展为允许:
- 按行号任意指定所需字段
- 通过属于每个字段的输入行上的注释引用的名称任意指定所需字段
- 输出文件自动以输入文件名命名,输入文件名可能有一个扩展名(例如.txt),并且其指示的路径(如果有)必须不是有点;换句话说,最好从包含输入文件的目录运行脚本
# some preparations
BEGIN {
# output files named as the input file name
split(ARGV[1], f, ".")
outpfx = f[1]
# remember wanted fields specified on command line as comma-separated line numbers
if (nums) {
# split variable "nums" on comma into helper array "r"
n = split(nums, r, ",")
# loop over helper array to build final array, thus indexed by wanted line numbers
while (n) rows[r[n--]]
}
}
# here we operate on multipliers file
NR != FNR {
# output file name for this set of multipliers
out = outpfx FNR ".txt"
# create/overwrite this output file
printf "" > out
# loop over template lines scanned from input file
for (linenum = multnum = 1; linenum <= numlines; linenum++)
# use the template line as printf format string to consume values to be multiplied (if any)
printf tmpl[linenum] ORS, linenum in wanted_values ? $(multnum++)*wanted_values[linenum] : 0 >> out
close(out)
next
}
# here we scan the input file to build a template for printf
{
# escape existing % chars as we are going to leverage printfs own format string which is %-based
gsub(/%/, "%%")
# on specified line numbers or named fields:
if (NR in rows || names && match($3, "^("names")[^[:alnum:]_]*$")) {
# remember this value
wanted_values[NR] = $1
# replace the original value with the printfs conversion specification for floating-point values
# it will be used by printf later on while processing the multipliers file
sub(/^[0-9]+(\.[0-9]+)?/, OFMT)
}
# remember this whole line as a template
tmpl[NR] = $0; numlines++
}
像这样使用它:
# specify fields by their line numbers, each separated by a comma
LC_NUMERIC=C awk -f script -v nums=36,38,39 input.txt multipliers.txt
# or specify fields by their names, each separated by the | character (NOTE it's a regexp)
LC_NUMERIC=C awk -f script -v names='vox_la_max|knockout_max|shed_prob' input.txt multipliers.txt
# or also use both ways of specifying fields
LC_NUMERIC=C awk -f script -v nums=15,112,234,71,5 -v names='vox_la_max|numesp' input.txt multipliers.txt
如果指定的字段多于乘数,则超出的字段将变为0
(乘以 0)。
如果您指定的字段少于乘数,则多余的乘数将被忽略。
在任何情况下,字段始终按照它们出现的行号的顺序消耗乘数,即输入文件中遇到的第一个字段消耗第一个乘数,无论您如何指定该字段。
答案2
使用珀尔生成输出文件。
perl -wMstrict -pale '
BEGIN {
## variables declaration
use vars qw($name $extn);
use vars qw($header $template $footer);
use vars qw(@glob_params $num_re);
## pick apart include.txt filename into its components
($name, $extn) =
pop =~ m{(.*?)(\.[^.]*|)$}x;
## split and stuff into variables the include.txt file
($header, $template, $footer) =
do{local $/;<STDIN>;} =~
m{\A
(.*?\n)
(\#+\h*GENERAL\h+PARAMETERS\h*\n.*?\n\n)
(.*)
}xms;
## names of global parameters to vary
@glob_params =
qw( vox_la_max knockout_max shed_prob);
# cinstruct the regex to search for numbers
my $sign = qr{ [-+] }x;
my $float = qr{ \d+(?:\.\d*)?|\.\d+ }x;
my $exponent = qr{ [eE][-+]?\d+ }x;
$num_re =
qr{ $sign? $float $exponent? }x;
}
##### multiplier.txt processed from here onwards
my %mult;
@mult{ @glob_params } = @F;
my $template_copy = $template;
for (@glob_params) {
$template_copy =~
s{^($num_re)(?=\h+/\*\h*\Q$_\E\b)}{$1 * $mult{$_}}xme;
}
my $out = sprintf "%s%d%s",
$name, $., $extn;
open my $fh, ">", $out
or die "Opening $out for writing:$!";
select $fh;
s/.*/$header$template_copy$footer/;
' multipliers.txt < include.txt include.txt