awk/perl 语句验证并返回文件开头的固定结构数据行

Question

这是一个简单的脚本，应该很容易理解，并且可以轻松移植到其他语言，如 AWK、Python、Bash...用法：perl validate.pl input.txt

use strict;
use warnings;

my @data;
my $a = 0;
my ($filename) = @ARGV;
my $expr = '^##[[:space:]]*([a-zA-Z0-9_-]+\.)+[[:space:]]+[^[:space:]]+[[:space:]]+[0-9]+[[:space:]]+(\(.*)##[[:space:]]*$';
open my $fh, "<:encoding(utf8)", $filename or die "Could not open $filename: $!";

while( my $line = <$fh>)  {
    chomp $line;
    if ($. == 1 and $line ne '## CONFIG-PARAMS-START ##') {
        exit 1;
    }
    if ($. == 2 and $line ne '##') {
        exit 1;
    }
    if ($a == 1 and $line eq '##') {
        $a = 2;
        next;
    }
    if ($. > 2 and $a < 2) {
        if ($line =~ /$expr/) {
            push @data, $line;
            $a = 1;
            next;
        } else {
            exit 1;
        }
    }
    if ($a == 2) {
        if ($line eq '## CONFIG-PARAMS-END ##') {
            print join("\n", @data), "\n";
            exit 0;
        } else {
            exit 1;
        }
    }
}

我还写了一个略有不同的、感觉更原生的版本：

use strict;
use warnings;

my @data, my ($filename) = @ARGV, my $expr = '^##\s*([\w-]+\.)+\s+\S+\s+\d+\s+\(.*##\s*$';
open my $fh, "<:encoding(utf8)", $filename or die "Could not open $filename: $!";

while(<$fh>)  {
    chomp;
    if ($. == 1 and !/^## CONFIG-PARAMS-START ##$/) {exit 1}
    if ($. == 2 and !/^##$/) {exit 1}
    if ($. > 2) {
        if (/^##$/ and scalar @data == 0) {exit 1}
        if (/^##$/ and scalar @data  > 0) {
            if (<$fh> =~ /^## CONFIG-PARAMS-END ##$/) {
                print join("\n",@data), "\n"; exit 0;
            } else {exit 1;}
        }
        if (/$expr/) {push @data, $_;} else {exit 1}
    }
}

解释：

正则表达式利用了 Perl 特有的简写方式，让我更容易阅读：
- \d对于任何数字 ( [0-9])
- \w表示单词字符 ( [a-zA-Z0-9_])
- \s为空格 ( [\r\n\t\f\v ])
- \S对于非空间 ( [^\r\n\t\f\v ])
<$fh>$fh从文件句柄读取一行
chomp\n删除当前行的尾部
$_表示当前元素（行）。
如果缺失则隐含，因此 egif (/^##$/)实际上表示if($_ =~ /^##$/)。
$.包含当前行号
scalar @data@data是数组中元素的数量

Answer 1

这是一个简单的脚本，应该很容易理解，并且可以轻松移植到其他语言，如 AWK、Python、Bash...用法：perl validate.pl input.txt

use strict;
use warnings;

my @data;
my $a = 0;
my ($filename) = @ARGV;
my $expr = '^##[[:space:]]*([a-zA-Z0-9_-]+\.)+[[:space:]]+[^[:space:]]+[[:space:]]+[0-9]+[[:space:]]+(\(.*)##[[:space:]]*$';
open my $fh, "<:encoding(utf8)", $filename or die "Could not open $filename: $!";

while( my $line = <$fh>)  {
    chomp $line;
    if ($. == 1 and $line ne '## CONFIG-PARAMS-START ##') {
        exit 1;
    }
    if ($. == 2 and $line ne '##') {
        exit 1;
    }
    if ($a == 1 and $line eq '##') {
        $a = 2;
        next;
    }
    if ($. > 2 and $a < 2) {
        if ($line =~ /$expr/) {
            push @data, $line;
            $a = 1;
            next;
        } else {
            exit 1;
        }
    }
    if ($a == 2) {
        if ($line eq '## CONFIG-PARAMS-END ##') {
            print join("\n", @data), "\n";
            exit 0;
        } else {
            exit 1;
        }
    }
}

我还写了一个略有不同的、感觉更原生的版本：

use strict;
use warnings;

my @data, my ($filename) = @ARGV, my $expr = '^##\s*([\w-]+\.)+\s+\S+\s+\d+\s+\(.*##\s*$';
open my $fh, "<:encoding(utf8)", $filename or die "Could not open $filename: $!";

while(<$fh>)  {
    chomp;
    if ($. == 1 and !/^## CONFIG-PARAMS-START ##$/) {exit 1}
    if ($. == 2 and !/^##$/) {exit 1}
    if ($. > 2) {
        if (/^##$/ and scalar @data == 0) {exit 1}
        if (/^##$/ and scalar @data  > 0) {
            if (<$fh> =~ /^## CONFIG-PARAMS-END ##$/) {
                print join("\n",@data), "\n"; exit 0;
            } else {exit 1;}
        }
        if (/$expr/) {push @data, $_;} else {exit 1}
    }
}

解释：

正则表达式利用了 Perl 特有的简写方式，让我更容易阅读：
- \d对于任何数字 ( [0-9])
- \w表示单词字符 ( [a-zA-Z0-9_])
- \s为空格 ( [\r\n\t\f\v ])
- \S对于非空间 ( [^\r\n\t\f\v ])
<$fh>$fh从文件句柄读取一行
chomp\n删除当前行的尾部
$_表示当前元素（行）。
如果缺失则隐含，因此 egif (/^##$/)实际上表示if($_ =~ /^##$/)。
$.包含当前行号
scalar @data@data是数组中元素的数量

awk/perl 语句验证并返回文件开头的固定结构数据行

答案1

相关内容