按第一列合并多个文件

按第一列合并多个文件

我的目录中有五十多个具有不同名称的文件。例如:

文件1:

Type,A,
RR,1,
CD,2,

文件2:

Type,B,
CD,2,
FG,3,

文件3:

Type,C,
RR,5,
FG,8,
QR,9,

所需输出

Type,A,B,C,
CD,2,2,,
FG,,3,8,
QR,,,9,
RR,1,,5

我尝试过joinpaste但没有运气......有什么建议吗?

答案1

这是一些相当棘手的 GNU awk。gawk需要GNU awk ( )数组的数组

gawk -F, '
    NR  == 1 {n=1; header[n] = $1}
    FNR == 1 {n++; header[n] = $2; next}

    !($1 in data) {data[$1][1] = $1}
    {data[$1][n] = $2}

    # from https://www.gnu.org/software/gawk/manual/html_node/Join-Function.html
    function join(array, start, end, sep,    result, i)
    {
        if (sep == "")
            sep = " "
        else if (sep == SUBSEP) # magic value
            sep = ""
        result = array[start]
        for (i = start + 1; i <= end; i++)
            result = result sep array[i]
        return result
    }

    END {
        print join(header, 1, n, FS)
        PROCINFO["sorted_in"] = "@ind_str_asc"   # for sorted output
        for (type in data)
            print join(data[type], 1, n, FS)
    }
' file{1,2,3}
Type,A,B,C
CD,2,2,
FG,,3,8
QR,,,9
RR,1,,5

我假设每个文件都有 2 列,所以它不完全通用。


不依赖GNU awk的版本(用mawk测试)

mawk -F, '
    NR  == 1 {n=1; header[n] = $1}
    FNR == 1 {n++; header[n] = $2; next}
    {key[$1]; data[$1,n] = $2}
    END {
        for (i=1; i<=n; i++)
            printf "%s%s", header[i], (i==n ? ORS : FS)
        for (type in key) {
            printf "%s%s", type, FS
            for (i=2; i<=n; i++)
                printf "%s%s", data[type,i], (i==n ? ORS : FS)
        }
    }
' file{1,2,3}

答案2

即使没有真正的多维数组,这也不是特别难:

/Type/ { type=$2; types[$2] = 1 }
!/Type/ { data[type,$1] = $2; keys[$1] = 1 }
END {
    m = asorti(types)
    value = "Type"
    for (i = 1; i <= m; i++) {
        value = value "," types[i];
    }
    print value;
    n = asorti(keys)
    for (i = 1; i <= n; i++) {
        value=keys[i]
        for (k = 1; k <= m; k++) {
            value = value "," data[types[k],keys[i]]
        }
        print value;
    }
}

然而,您仍然需要 GNUawk来实现排序功能。

相关内容