第二个循环问题

Question 1

在 Bashfor循环中遵循以下语法：

for <variable name> in <a list of items> ; do <some command> ; done

让我们来分解一下。

for告诉 shell 它将迭代一个数组。

<variable name>为 shell 提供一个位置来存储当前正在迭代的数组中的条目。

in <a list of items>指定要迭代的数组。

;指定换行符，这可以是分号，也可以是脚本中的实际换行符。

do <some command>是您想要在循环中执行的命令，它可能包含之前在 for 循环中定义的变量，但不一定必须如此。

;再次换行，这次是为了准备结束循环。

done这关闭了循环。

因此，在for f in $my_files您添加的内容中，我们可以看到此后有一个换行符，但是do您定义的不是 shell 所期望的 a ，而是定义了一个 shell 不期望的变量。因为 shell 不希望出现这种情况，所以它会退出并显示语法错误消息。done您想要循环的代码末尾还缺少结束符；循环while有一个适当的done，但循环没有for。

此外，您可能需要考虑避免解析 ls。它可能会导致问题，对于简单的事情（例如迭代文件），您只需删除以下内容即可轻松完成相同的事情ls：

thegs@wk-thegs-01:test$ ls 
test1.txt  test2.txt  test3.txt
thegs@wk-thegs-01:test$ for file in test*.txt ; do echo $file ; done
test1.txt
test2.txt
test3.txt

在继续之前温习一下循环语法并没有什么坏处，Redhat 提供了一些无障碍文档关于 bash 中的循环，我强烈建议阅读（不幸的是他们解析了ls，但是嘿，没有人是完美的）。

Answer

在 Bashfor循环中遵循以下语法：

for <variable name> in <a list of items> ; do <some command> ; done

让我们来分解一下。

for告诉 shell 它将迭代一个数组。

<variable name>为 shell 提供一个位置来存储当前正在迭代的数组中的条目。

in <a list of items>指定要迭代的数组。

;指定换行符，这可以是分号，也可以是脚本中的实际换行符。

do <some command>是您想要在循环中执行的命令，它可能包含之前在 for 循环中定义的变量，但不一定必须如此。

;再次换行，这次是为了准备结束循环。

done这关闭了循环。

因此，在for f in $my_files您添加的内容中，我们可以看到此后有一个换行符，但是do您定义的不是 shell 所期望的 a ，而是定义了一个 shell 不期望的变量。因为 shell 不希望出现这种情况，所以它会退出并显示语法错误消息。done您想要循环的代码末尾还缺少结束符；循环while有一个适当的done，但循环没有for。

此外，您可能需要考虑避免解析 ls。它可能会导致问题，对于简单的事情（例如迭代文件），您只需删除以下内容即可轻松完成相同的事情ls：

thegs@wk-thegs-01:test$ ls 
test1.txt  test2.txt  test3.txt
thegs@wk-thegs-01:test$ for file in test*.txt ; do echo $file ; done
test1.txt
test2.txt
test3.txt

在继续之前温习一下循环语法并没有什么坏处，Redhat 提供了一些无障碍文档关于 bash 中的循环，我强烈建议阅读（不幸的是他们解析了ls，但是嘿，没有人是完美的）。

Question 2

Shell 是处理数据的错误语言。您应该使用awk, orperl或python（或几乎任何非 shell 语言）。看为什么使用 shell 循环处理文本被认为是不好的做法？和为什么我的 shell 脚本会因为空格或其他特殊字符而卡住？原因有很多。

此外，许多语言都有用于处理 NetCDF 数据的库模块...例如，perl 有PDL::NetCDF和蟒蛇有网络CDF4。

即使不使用 NetCDF 处理库，也awk可以perl更轻松地编写可能在 shell 中执行的常见任务脚本。

例如，这是脚本的 Perl 版本 - 选择 Perl 是因为它将 sed、awk、cut、tr 的许多功能组合到一种语言中，并且功能极其有用split()，最后是因为 perl 的system()函数可以采用一组参数而不仅仅是一个字符串（它会产生与 shell 相同的烦恼并且需要相同的解决方法）：

#!/usr/bin/perl

use strict;
my @coords=();

# Read coords.txt into an array, so we don't have to read it
# again for each year.
#
# Yes, you could read coords.txt into an array in bash too - I very
# strongly encourage you to do so if you decide to stick to shell.
# In bash, its probably best to read coords.txt into three arrays, one
# each for station, lon, and lat. Or two associative arrays, one each
# for lon and lat (both with station as the key).
# Anyway, see `help mapfile` in bash.

my $coords = "coords.txt";
open(my $C, "<", $coords) || die "couldn't open $coords for read: $!\n";
while(<$C>) {
  next if /^station/; # skip header
  chomp;              # get rid of \n, \r, or \r\n line-endings
  push @coords, $_;
};
close($C);

# process each year
foreach my $num (2016..2018) {
  my $infile = "era_temperature_$num.nc";

  # process the coords data for the current year
  foreach (@coords) {
    my ($station, $lat, $lon) = split;
    $outfile = "${station}_${num}_${lat}_${lon}_out.nc";

    system("cdo", "-remapnn", "lon=${lon}_lat=${lat}", $infile, $outfile);
  };
};

请注意system()，使用它是完全安全的$infile，并且$outfile不带引号，因为它将每个整个变量传递为一论点，cdo无论它包含什么。这是不是true 在 bash 中 - 如果$infile或$outfile包含任何空格或 shell 元字符（例如;, &）并且在没有双引号的情况下使用，它们将受到 shell 分词和解释的影响将要导致脚本中断（因此，请记住始终在 shell 中用双引号引用变量）

这是使用两个关联数组的替代版本。这可能会稍微快一些（因为它只需要split()对 coords.txt 的每一行使用一次），但可能不会明显，除非 coords.txt 文件有数千行：

#!/usr/bin/perl

use strict;
my %lon = ();
my %lat = ();

# Read coords.txt into two hashes (associative arrays), one
# each for lon and lat.

my $coords = "coords.txt";
open(my $C, "<", $coords) || die "couldn't open $coords for read: $!\n";
while(<$C>) {
  next if /^station/; # skip header
  chomp;              # get rid of \n, \r, or \r\n
  my ($station, $lat, $lon) = split;
  $lat{$station} = $lat;
  $lon{$station} = $lon;
}
close($C);

foreach my $num (2016..2018) {
  my $infile = "era_temperature_$num.nc";
  foreach my $station (sort keys %lat) {
    # Two different ways of constructing a string from other variables.

    # Simple interpolation, as in the first version above:
    my $outfile = "${station}_${num}_${lat{$station}}_${lon{$station}}";

    # And string concatenation with `.`, which can be easier to read
    # in some cases.
    my $lonlat = "lon=" . $lon{$station} . "_lat=" . $lat{$station};

    # Another method is to use sprintf, which can be even easier to read.
    # For example, use the following instead of the line above:
    # my $lonlat = sprintf "lon=%s_lat=%s", $lon{$station}, $lat{$station};
    #
    # note: bash has a printf built-in too.  I highly recommend using it.
    

    system("cdo", "-remapnn", $lonlat, $infile, $outfile);
  };
};

顺便说一句，perl 还有一些非常有用的引用运算符 - 例如，qw()它允许您将该system()行写为：

system(qw(cdo -remapnn lon=${lon}_lat=${lat} $infile $outfile));

或（对于关联数组版本）：

system(qw(cdo -remapnn $lonlat $infile $outfile));

perldoc -f qw详情请参阅。

最后，有些人无知地声称 Perl 很难阅读或理解（AFAICT 这主要是因为他们害怕 Perl 和 sed 一样，有正则表达式的运算符 - 没有被包装在函数调用正则表达式中有点可怕并且不可读）....IMO，上面的两个 perl 示例都比具有多个命令替换的 shell 脚本更清晰、更容易阅读和理解。他们也会运行得更快，因为他们不需要sed分叉cut四次每个循环的迭代（即 3 年次，无论 coords.txt 中有多少行）。

Answer