复杂的DIFF方法

Question

这种数据可以作为多级关联数组（或 Perl 术语中的 Hash-of-Hashes / HoH，请参阅佩尔兹卡，perl 数据结构手册），第一级键是节点名称，第二级键（我在下面的脚本中称为“子键”）是相关字段名称（可用性、状态、原因等）。

例如：

#!/usr/bin/perl

use strict;

die "Usage $0 [oldfile] [newfile]\n" unless (@ARGV == 2) ;

# remember both filename args
my ($oldfile,$newfile) = @ARGV[0,1];

die "$oldfile is not readable or does not exist\n" unless -r $oldfile;
die "$newfile is not readable or does not exist\n" unless -r $newfile;

# Hash variables to hold old and new data
my (%old, %new);

# Hash reference variable pointing to the hash we want
# the main loop to populate at any given moment.
# Starts off pointing to %old, changes to %new after the
# first file reaches end-of-file.
# See https://perldoc.perl.org/perlreftut and
# https://perldoc.perl.org/perlref
my $hashref = \%old;

# variable to hold the name of the current node name as
# the records in the input files are read in.
my $node;

# read and parse input files
while(<>) {
  chomp;
  s/^\s*|\s*$//g; # strip leading and trailing whitespace
  s/\s+:\s+/ : /; # strip excess whitespace around first :

  if (/^Ltm::Node:.*\s+\((.*)\)/) {
    $node = $1;
    $hashref->{$node}{name} = $node;
  } elsif (/ : /) {
    my ($key, $val) = split / : /,$_, 2;
    $hashref->{$node}{$key} = $val
  } else {
    print STDERR "Unknown data '$_' on line $. of $ARGV\n";
  };

  if (eof) {
    close(ARGV);       # reset line counter
    $hashref = \%new;  # start populating %new instead of %old
  }
};

# compare the keys from both files
my @common_keys = ();
foreach my $k (keys %old) {
  if (exists($new{$k})) {
    push @common_keys, $k;
  } else {
    print "Node $k found in $oldfile but not in $newfile\n"
  };
};

foreach my $k (keys %new) {
  if (! exists($old{$k})) {
    print "Node $k found in $newfile but not in $oldfile\n";
  };
}

# The list of sub-keys we care about.
my @subkeys = ('Availability', 'State', 'Reason', 'Monitor',
               'Monitor Status');

# now compare sub-keys in each of the nodes
foreach my $k (@common_keys) {
  foreach my $sk (@subkeys) {
    if ($old{$k}{$sk} ne $new{$k}{$sk}) {
      printf "[%-15s %-14s] Old = \"%s\", new = \"%s\"\n", $k, $sk,
        $old{$k}{$sk}, $new{$k}{$sk};
    }
  }
}

将其另存为，例如compare.pl，使其可执行chmod +x compare.pl并像这样运行它：

$ ./compare.pl old.txt new.txt  
Node 10.72.12.150 found in old.txt but not in new.txt
Node 10.72.12.149 found in new.txt but not in old.txt
[10.72.12.148    State         ] Old = "enabled", new = "xenabled"
[10.72.7.122     Reason        ] Old = "Node address does not have service checking enabled", new = "xNode address does not have service checking enabled"

注意：除了行上的细微差别之外Ltm::Node，两个输入文件中的数据是相同的，因此我必须通过编辑 new.txt 在x某些字段之前添加 an 来做出一些差异。我还将节点 10.172.12.150 添加到 old.txt，将 10.172.12.149 添加到 new.txt。

另外值得注意的是，perl 哈希本质上是无序的，因此输出可能会在每次运行时以不同的顺序打印节点差异。通过%old在填充数组时进行排序来获得一致的顺序是很容易的@common_keys，但是您必须实现自然排序/版本排序子例程（或使用其中之一）自然排序模块在CPAN) 以便 IP 地址正确排序。我将把这个增强功能留给读者作为练习——对于这个例子来说，这不是必需的。

您可以编辑打印语句以更改输出以满足您的需要。您没有指定输出，所以我只是打印了似乎需要轻松识别差异的内容。

在 awk 中编写类似的东西并不困难（尤其是 GNU awk，因为它对多维数组有合理的支持），但我更喜欢 perl，即使它往往比 awk 更冗长（实际上部分）因为的）。

Answer 1