两个(LaTeX)文件有何不同?

两个(LaTeX)文件有何不同?

我需要了解过去一年中我修改最多的章节。当然,有很多指标可以用来衡量文件的差异,但我决定使用连续的单词对。我想与有类似需求的人分享这个小实用程序。

程序的重点在于简单性。快速破解,也易于修改。

这与备受推荐的 latexdiff 程序的需求不同。我需要的是基本的差异统计,而不是协调文件的方法。

答案1

#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use utf8;

use warnings FATAL => qw{ uninitialized };

use Perl6::Slurp;

use Math::BigFloat;
sub round { Math::BigFloat->new(shift)->bfround(1); }


=pod

=head1 Title

  wordpairdiff.pl --- compare two text files by the frequency of consecutive word pairs

=cut

my $verbose=1;

my $usage = "$0: oldfile.tex newfile.tex";

(@ARGV) or die $usage;
($#ARGV < 2) or die "$usage: need exactly two filenames as arguments\n";

($ARGV[0]) or die "$usage: need first filename\n";
(-e $ARGV[0]) or die "$usage: first file $ARGV[0] does not exist\n";
my $ofnm= $ARGV[0];

($ARGV[1]) or die "$usage: need second filename\n";
(-e $ARGV[1]) or die "$usage: second file $ARGV[1] does not exist\n";
my $nfnm= $ARGV[1];



my @npairs = slurp( $nfnm ) =~ /(?=(\S+\s+\S+))\S+/g;  ## create consecutive word pairs
my @opairs = slurp( $ofnm ) =~ /(?=(\S+\s+\S+))\S+/g;

my %seen = ();
foreach (@npairs) { ++$seen{$_}; }
foreach (@opairs) { --$seen{$_}; }

my $pos=0; my $neg=0;
foreach my $wpair (keys %seen) {
  ($seen{$wpair} == 0) and next;
  ($seen{$wpair} > 0) and $pos+= $seen{$wpair};
  ($seen{$wpair} < 0) and $neg-= $seen{$wpair};
}

my $aseen=(scalar keys %seen);
my $changes= $pos + $neg;

print "$ofnm vs. $nfnm: ".round(100*(($changes)/($aseen)),3)."%";
($verbose) and print "\t(Changes: $changes.  Word Pairs Examined: $aseen, Neg: $neg, Pos: $pos)";
print "\n";

示例使用:

$ wpairdiff oldfile.tex newfile.tex
oldfile.tex vs. newfile.tex: 12%    (Changes: 491.  Word Pairs Examined: 3935, Neg: 306, Pos: 185)

相关内容