根据 id 对和较高较低值解析文件

Question 1

这是一个不优雅的awk解决方案：

{
    split($3, a, "|")
    split($4, b, "|")
    if (a[2] > b[2]){
        $3=b[1]"|"b[2]
        $4=a[1]"|"a[2]
    }
    split(arr[$3" "$4], c, " ")
    if ($8 > c[8]){
        arr[$3" "$4] = $0
    }
}
END{
    for (item in arr){
        print(arr[item])
    }
}

运行与

awk -f script.awk input

它不保留间距，并且顺序是随机的。

Answer

这是一个不优雅的awk解决方案：

{
    split($3, a, "|")
    split($4, b, "|")
    if (a[2] > b[2]){
        $3=b[1]"|"b[2]
        $4=a[1]"|"a[2]
    }
    split(arr[$3" "$4], c, " ")
    if ($8 > c[8]){
        arr[$3" "$4] = $0
    }
}
END{
    for (item in arr){
        print(arr[item])
    }
}

运行与

awk -f script.awk input

它不保留间距，并且顺序是随机的。

Question 2

建立在pfnuesel 的回答,

{
    split($3, a, "|")
    split($4, b, "|")
    if (a[2] > b[2]){
        $3=b[1]"|"b[2]
        $4=a[1]"|"a[2]
    }
    key=$3" "$4
    split(arr[key], c, " ")
    if ($8 > c[8]  ||  ($8 == c[8] && $7 > c[7])){
        arr[key] = $0
    }
}
END{
    for (item in arr){
        print(arr[item])
    }
}

如问题中所示（但未明确说明），这假设第三列和第四列中的值的形式为

一些_字符串|数字

其中空格仅供说明之用，并且细绳不包含任何|字符。这些标记将根据数字s;这细绳不比较前缀。

如在pfnuesel 的回答，用法是

awk -f script.awk file1

输入文件的确切间距丢失，但可以通过管道通过（重新）创建可读的列间距 column -t；例如，

awk -f script.awk file1 | column -t > file2

Answer