删除文本文件中重叠的 IP 范围

删除文本文件中重叠的 IP 范围

为了扩展我的上一个问题,我有一个 IP 范围列表,格式如下:

Long description:20.1.1.0-20.1.1.14
Another description:5.5.5.0-5.5.5.100
Yet another description:20.1.1.0-20.1.1.40

没有重复项,但我想删除任何重叠的 IP 范围。

例如,在上面的示例中,应删除第一行,因为其范围已包含在第三行中。

笔记:我需要保留整行(包括说明),而不仅仅是 IP 范围。

答案1

如果您同意输入行重新排序,我有一个使用 GNU Awk 和“sort”命令的相对简单的解决方案。基本思想是将 IP 地址转换为单个数字而不是点对,这使得比较它们变得非常容易,并使用-k排序标志,该标志允许指定它应该只对特定字段进行排序。

为了紧凑性,这还使用了协进程的 GNU awk 功能,这使得在使用之前和之后处理数据变得非常容易sort

编辑:这个答案的原始版本中的命令sort行略有错误:sort -k2,3r实际上将字段23视为单个键,以相反的顺序排序。sort -k2,2n -k3,3rn将执行必要的操作,首先按字段排序2并使用(反向)字段3作为决胜局:

# Run as: gawk -F: -f <thisfile.awk> <input file>
BEGIN {
  # Define the sort command that we will be using later as a variable
  # Sort by
  #   - the 1st ip, smallest-to-largest
  #   - the 2nd ip, largest-to-smallest
  sort="sort -n -t: -k2,2n -k3,3nr";
}

# For every line:
{
  # Store the individual components of the addresses into 'ips'
  match($2, /([[:digit:]]+).([[:digit:]]+).([[:digit:]]+).([[:digit:]]+)\
-([[:digit:]]+).([[:digit:]]+).([[:digit:]]+).([[:digit:]]+)/, ips);
  # Add the components together to get the IPs as a single number.
  # The print also uses : as the delimiter between the 2 IPS for simplicity
  print $1":"ips[4]+256*(ips[3]+256*(ips[2]+256*ips[1])) \
          ":"ips[8]+256*(ips[7]+256*(ips[6]+256*ips[5])) \
    |& sort
}

# After sending all lines to sort in the appropriate format
END {
  # Close sort's input stream, so that we can read its output
  close(sort, "to");
  # Keep track of the upper end of the previous range
  prevHigh=0;
  # Read & field-split all lines from sort's output
  while((sort |& getline) > 0) {
     # One range is contained in another if its low address is >= the
     # other's (guaranteed by the sort command) and its high address is <=
     # the other's. So, we should print this record when its high address is >
     # prevHigh:
    if ($3 > prevHigh) {
      print $1":"int($2/(256*256*256))%256"."int($2/(256*256))%256"." \
                 int($2/256)%256"."$2%256 \
              "-"int($3/(256*256*256))%256"."int($3/(256*256))%256"." \
                 int($3/256)%256"."$3%256 \
      # This is now the previous range
      prevHigh = $3
    }
  }
}

相关内容