我想从文本文件中删除重复/反向匹配对。
例如,该文件包含如下数据:
10 |Name1 |20 |Name2
20 |Name2 |30 |Name3
20 |Name2 |10 |Name1 <-- Inverse pair (compared to first line) to be removed from text file
40 |Name4 |30 |Name3
我期待如下输出:
10 |Name1 |20 |Name2
20 |Name2 |30 |Name3
40 |Name4 |30 |Name3
答案1
使用 awk:
awk -F '[[:blank:]]*[|][[:blank:]]*' -v SUBSEP='|' '
($1,$2,$3,$4) in seen || ($3,$4,$1,$2) in seen {next}
{seen[$1,$2,$3,$4]; print}
' file
这将字段分隔符设置为带有可选前导或尾随空格的管道字符。然后它以任一顺序查找“对”作为关联数组键:如果找到,则跳过此行;否则,将键添加到数组并打印该行。
答案2
也许是这样的:
perl -F'[|]' -lane '
for (@F) {
# trim the fields to remove leading and trailing blanks
s/^\s+//; s/\s+$//
}
# re-join the trimmed fields into $a
my $a = join "|", @F[0..3];
# same, inverting the two pairs into $b
my $b = join "|", @F[2,3,0,1];
# print unless either $a or $b has been seen before
print unless $seen{$a} || $seen{$b}++' < your-file
要推广到以任何顺序找到的任意数量的对,您需要对这些对进行排序以形成%seen
关联数组的键:
perl -F'[|]' -lane '
for (@F) {
# trim the fields to remove leading and trailing blanks
s/^\s+//; s/\s+$//
}
my @pairs;
while (my ($a, $b) = splice(@F, 0, 2)) {
push @pairs, "$a|$b"
}
my $key = join "|", sort @pairs;
print unless $seen{$key}++' < your-file
答案3
您可以使用 sed 来完成此操作。下面的代码使用 gnu sed,但它可以很容易地使其兼容 posix。
sed -Ee '
$!{
s/$/|/
N
s/[[:blank:]]+//g
H;s/.*//;x;D
}
s/$/|/
G;H;g
y/\n_/_\n/
:xdup
s/_((([^_|]+[|]){2})(([^_|]+[|]){2}))_(.*_)?\4\2_/_\1_\6/
txdup
s/^_//;s/_$//
y/\n_/_\n/
' input.csv
稍后将在高峰时段临近时解释
结果 :
10|Name1|20|Name2|
20|Name2|30|Name3|
40|Name4|30|Name3|