根据公共列合并多个文件

Question 1

以下是使用 python 实现此目的的一种方法。

代码：

import sys

columns = []
data = {}
ids = set()
for filename in sys.argv[1:]:
    with open(filename, 'rU') as f:
        key = next(f).strip().split()[1]
        columns.append(key)
        data[key] = {}
        for line in f:
            if line.strip():
                id, value = line.strip().split()
                try:
                    data[key][int(id)] = value
                except ValueError as exc:
                    raise ValueError(
                        "Problem in line: '{}' '{}' '{}'".format(
                            id, value, line.rstrip()))

                ids.add(int(id))

print('\t'.join(['ID'] + columns))

for id in sorted(ids):
    line = []
    for column in columns:
        line.append(data[column].get(id, '0'))
    print('\t'.join([str(id)] + line))

结果：

ID  Value1  Value2  Value150
1   40  0   0
2   30  0   71
3   70  50  0
4   0   70  0
9   0   20  98
10  0   0   52

Answer

以下是使用 python 实现此目的的一种方法。

代码：

import sys

columns = []
data = {}
ids = set()
for filename in sys.argv[1:]:
    with open(filename, 'rU') as f:
        key = next(f).strip().split()[1]
        columns.append(key)
        data[key] = {}
        for line in f:
            if line.strip():
                id, value = line.strip().split()
                try:
                    data[key][int(id)] = value
                except ValueError as exc:
                    raise ValueError(
                        "Problem in line: '{}' '{}' '{}'".format(
                            id, value, line.rstrip()))

                ids.add(int(id))

print('\t'.join(['ID'] + columns))

for id in sorted(ids):
    line = []
    for column in columns:
        line.append(data[column].get(id, '0'))
    print('\t'.join([str(id)] + line))

结果：

ID  Value1  Value2  Value150
1   40  0   0
2   30  0   71
3   70  50  0
4   0   70  0
9   0   20  98
10  0   0   52

Question 2

使用命令行工具的 Bash 解决方案。输入文件列表乱序，因此输出ls -v为cat。

while read line; do
    if [[ "$line" =~ ID ]]; then
        array=${line##* }
        index+=($array)
        continue
    else
        eval $array'[${line% *}]=${line#* }'
    fi
done <<<"$( cat $(ls -v file[0-9]*.txt) )"

printf ID
for name in ${index[@]}; do
    printf ' %s' $name
done
echo

max_ind=$( sort -nu file[0-9]*.txt | tail -n1 | cut -d' ' -f1 )

for (( j = 1 ; j <= $max_ind ; j++ )); do
    for (( i = 0 ; i < ${#index[@]} ; i++ )); do
        value=$( eval 'echo ${'${index[i]}'[j]}' )
        roll+=$( [ "$value" ] &&
            printf "%-${#index[i]}s " $value ||
            printf "%-${#index[i]}s " 0 )
    done
    [[ "$roll" =~ [^0\ ] ]] && printf '%-4s%s\n' $j "$roll"
    unset roll
done

Answer

使用命令行工具的 Bash 解决方案。输入文件列表乱序，因此输出ls -v为cat。

while read line; do
    if [[ "$line" =~ ID ]]; then
        array=${line##* }
        index+=($array)
        continue
    else
        eval $array'[${line% *}]=${line#* }'
    fi
done <<<"$( cat $(ls -v file[0-9]*.txt) )"

printf ID
for name in ${index[@]}; do
    printf ' %s' $name
done
echo

max_ind=$( sort -nu file[0-9]*.txt | tail -n1 | cut -d' ' -f1 )

for (( j = 1 ; j <= $max_ind ; j++ )); do
    for (( i = 0 ; i < ${#index[@]} ; i++ )); do
        value=$( eval 'echo ${'${index[i]}'[j]}' )
        roll+=$( [ "$value" ] &&
            printf "%-${#index[i]}s " $value ||
            printf "%-${#index[i]}s " 0 )
    done
    [[ "$roll" =~ [^0\ ] ]] && printf '%-4s%s\n' $j "$roll"
    unset roll
done

根据公共列合并多个文件

答案1

代码：

结果：

答案2

相关内容