我需要根据第一列和第二列合并两个文件。我需要添加第三列和第四列之间的差异,并添加第二列每种类型的 TOTAL 行。这是两个输入文件:
文件一:
VALIDATION_DATA_DBF DELETE 226 6.4
TXT_DBF DELETE 80 0.15
DEFAULT_PROFILE SELECT 45 1.2
TRAINING_DBF SELECT 130 5.25
TESTING_DBF SELECT 5 0.001
WARNING_DBF UPDATE 8 0.055
WARNING_DBF INSERT 5 2.4
文件2:
VALIDATION_DATA_DBF DELETE 200 7.4
TXT_DBF DELETE 70 1.15
DEFAULT_PROFILE SELECT 40 0.2
TRAINING_DBF SELECT 135 7.25
TESTING_DBF SELECT 7 0.009
PERF_DBF SELECT 10 0.004
WARNING_DBF UPDATE 2 1.055
合并的输出文件应如下所示:
TABLE TYPE COUNT1 COUNT2 DIFF_COUNT TIME1 TIME2 DIFF_TIME
VALIDATION_DATA_DBF DELETE 226 200 26 6.4 7.4 -1
TXT_DBF DELETE 80 70 10 0.15 1.15 -1
TOTAL DELETE 306 270 36 6.55 8.55 -2
DEFAULT_PROFILE SELECT 45 40 5 1.2 0.2 1
TRAINING_DBF SELECT 130 135 -5 5.25 7.25 -2
TESTING_DBF SELECT 5 7 -2 0.001 0.009 -0.008
PERF_DBF SELECT 0 10 -10 0 0.004 -0.004
TOTAL SELECT 180 192 -12 6.451 7.463 -1.012
WARNING_DBF UPDATE 8 2 6 0.055 1.055 -1
TOTAL UPDATE 8 2 6 0.055 1.055 -1
WARNING_DBF INSERT 5 0 5 2.4 0 2.4
TOTAL INSERT 5 0 5 2.4 0 2.4
答案1
对数组数组和 ARGIND 使用 GNU awk:
awk '
{
counts[$2][$1][ARGIND] = $3
times[$2][$1][ARGIND] = $4
}
END {
print "TABLE", "TYPE", \
"COUNT1", "COUNT2", "DIFF_COUNT", \
"TIME1", "TIME2", "DIFF_TIME"
for ( type in counts ) {
delete totCounts
delete totTimes
for ( table in counts[type] ) {
print table, type, \
counts[type][table][1]+0, \
counts[type][table][2]+0, \
counts[type][table][1] - counts[type][table][2], \
times[type][table][1]+0, \
times[type][table][2]+0, \
times[type][table][1] - times[type][table][2]
totCounts[1] += counts[type][table][1]
totCounts[2] += counts[type][table][2]
totTimes[1] += times[type][table][1]
totTimes[2] += times[type][table][2]
}
print "TOTAL", type, \
totCounts[1], totCounts[2], totCounts[1] - totCounts[2], \
totTimes[1], totTimes[2], totTimes[1] - totTimes[2]
}
}
' file1 file2 | column -t
TABLE TYPE COUNT1 COUNT2 DIFF_COUNT TIME1 TIME2 DIFF_TIME
VALIDATION_DATA_DBF DELETE 226 200 26 6.4 7.4 -1
TXT_DBF DELETE 80 70 10 0.15 1.15 -1
TOTAL DELETE 306 270 36 6.55 8.55 -2
WARNING_DBF UPDATE 8 2 6 0.055 1.055 -1
TOTAL UPDATE 8 2 6 0.055 1.055 -1
WARNING_DBF INSERT 5 0 5 2.4 0 2.4
TOTAL INSERT 5 0 5 2.4 0 2.4
PERF_DBF SELECT 0 10 -10 0 0.004 -0.004
TESTING_DBF SELECT 5 7 -2 0.001 0.009 -0.008
TRAINING_DBF SELECT 130 135 -5 5.25 7.25 -2
DEFAULT_PROFILE SELECT 45 40 5 1.2 0.2 1
TOTAL SELECT 180 192 -12 6.451 7.463 -1.012
答案2
您需要按文件、表格和类型保存计数和时间。最好使用两个数组,加上一个将表附加到类型。在END部分,评估完成;没有太多逻辑,但是要输入一堆文本......尝试
awk '
FNR == 1 {FN++
}
{CNT[FN,$1,$2] = $3
TIM[FN,$1,$2] = $4
if (!index(TYP[$2], $1)) TYP[$2] = TYP[$2] $1 FS
}
END {print "TABLE TYPE COUNT1 COUNT2 DIFF_COUNT TIME1 TIME2 DIFF_TIME"
for (t in TYP) {TC1 = TC2 = TDF = TT1 = TT2 = TTF = 0
for (m=split(TYP[t], T); m; m--) {TBL = T[m]
IX1 = 1 SUBSEP TBL SUBSEP t
IX2 = 2 SUBSEP TBL SUBSEP t
printf "%-20s%-8s%-7d%-7d%-10d %-6g%-6g%-6g\n", TBL, t, CNT[IX1]+0, CNT[IX2]+0, CNT[
TC1 += CNT[IX1]
TC2 += CNT[IX2]
TDF += CNT[IX1] - CNT[IX2]
TT1 += TIM[IX1]
TT2 += TIM[IX2]
TTF += TIM[IX1] - TIM[IX2]
}
printf "%-20s%-8s%-7d%-7d%-10d %-6g%-6g%-6g \n", "TOTAL", t, TC1, TC2, TDF, TT1, TT2, TTF
}
}
' OFS="\t" file[12]
TABLE TYPE COUNT1 COUNT2 DIFF_COUNT TIME1 TIME2 DIFF_TIME
TXT_DBF DELETE 80 70 10 0.15 1.15 -1
VALIDATION_DATA_DBF DELETE 226 200 26 6.4 7.4 -1
TOTAL DELETE 306 270 36 6.55 8.55 -2
WARNING_DBF INSERT 5 0 5 2.4 0 2.4
TOTAL INSERT 5 0 5 2.4 0 2.4
PERF_DBF SELECT 0 10 -10 0 0.004 -0.004
TESTING_DBF SELECT 5 7 -2 0.001 0.009 -0.008
TRAINING_DBF SELECT 130 135 -5 5.25 7.25 -2
DEFAULT_PROFILE SELECT 45 40 5 1.2 0.2 1
TOTAL SELECT 180 192 -12 6.451 7.463 -1.012
WARNING_DBF UPDATE 8 2 6 0.055 1.055 -1
TOTAL UPDATE 8 2 6 0.055 1.055 -1