我需要了解过去一年中我修改最多的章节。当然,有很多指标可以用来衡量文件的差异,但我决定使用连续的单词对。我想与有类似需求的人分享这个小实用程序。
程序的重点在于简单性。快速破解,也易于修改。
这与备受推荐的 latexdiff 程序的需求不同。我需要的是基本的差异统计,而不是协调文件的方法。
答案1
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use utf8;
use warnings FATAL => qw{ uninitialized };
use Perl6::Slurp;
use Math::BigFloat;
sub round { Math::BigFloat->new(shift)->bfround(1); }
=pod
=head1 Title
wordpairdiff.pl --- compare two text files by the frequency of consecutive word pairs
=cut
my $verbose=1;
my $usage = "$0: oldfile.tex newfile.tex";
(@ARGV) or die $usage;
($#ARGV < 2) or die "$usage: need exactly two filenames as arguments\n";
($ARGV[0]) or die "$usage: need first filename\n";
(-e $ARGV[0]) or die "$usage: first file $ARGV[0] does not exist\n";
my $ofnm= $ARGV[0];
($ARGV[1]) or die "$usage: need second filename\n";
(-e $ARGV[1]) or die "$usage: second file $ARGV[1] does not exist\n";
my $nfnm= $ARGV[1];
my @npairs = slurp( $nfnm ) =~ /(?=(\S+\s+\S+))\S+/g; ## create consecutive word pairs
my @opairs = slurp( $ofnm ) =~ /(?=(\S+\s+\S+))\S+/g;
my %seen = ();
foreach (@npairs) { ++$seen{$_}; }
foreach (@opairs) { --$seen{$_}; }
my $pos=0; my $neg=0;
foreach my $wpair (keys %seen) {
($seen{$wpair} == 0) and next;
($seen{$wpair} > 0) and $pos+= $seen{$wpair};
($seen{$wpair} < 0) and $neg-= $seen{$wpair};
}
my $aseen=(scalar keys %seen);
my $changes= $pos + $neg;
print "$ofnm vs. $nfnm: ".round(100*(($changes)/($aseen)),3)."%";
($verbose) and print "\t(Changes: $changes. Word Pairs Examined: $aseen, Neg: $neg, Pos: $pos)";
print "\n";
示例使用:
$ wpairdiff oldfile.tex newfile.tex
oldfile.tex vs. newfile.tex: 12% (Changes: 491. Word Pairs Examined: 3935, Neg: 306, Pos: 185)