如何合并和修改两个文件的列

如何合并和修改两个文件的列

我需要根据第一列和第二列合并两个文件。我需要添加第三列和第四列之间的差异,并添加第二列每种类型的 TOTAL 行。这是两个输入文件:

文件一:

VALIDATION_DATA_DBF           DELETE  226  6.4
TXT_DBF                       DELETE  80   0.15
DEFAULT_PROFILE               SELECT  45   1.2
TRAINING_DBF                  SELECT  130  5.25
TESTING_DBF                   SELECT  5    0.001
WARNING_DBF                   UPDATE  8    0.055
WARNING_DBF                   INSERT  5    2.4

文件2:

VALIDATION_DATA_DBF           DELETE  200  7.4
TXT_DBF                       DELETE  70   1.15
DEFAULT_PROFILE               SELECT  40   0.2
TRAINING_DBF                  SELECT  135  7.25
TESTING_DBF                   SELECT  7    0.009
PERF_DBF                      SELECT  10   0.004
WARNING_DBF                   UPDATE  2    1.055

合并的输出文件应如下所示:

TABLE                TYPE    COUNT1 COUNT2 DIFF_COUNT TIME1 TIME2 DIFF_TIME
VALIDATION_DATA_DBF  DELETE  226    200    26         6.4    7.4    -1
TXT_DBF              DELETE  80     70     10         0.15   1.15   -1
TOTAL                DELETE  306    270    36         6.55   8.55   -2
DEFAULT_PROFILE      SELECT  45     40     5          1.2    0.2     1   
TRAINING_DBF         SELECT  130    135    -5         5.25   7.25   -2
TESTING_DBF          SELECT  5      7      -2         0.001  0.009  -0.008
PERF_DBF             SELECT  0      10     -10        0      0.004  -0.004
TOTAL                SELECT  180    192    -12        6.451  7.463  -1.012 
WARNING_DBF          UPDATE  8      2      6          0.055  1.055  -1
TOTAL                UPDATE  8      2      6          0.055  1.055  -1
WARNING_DBF          INSERT  5      0      5          2.4    0      2.4
TOTAL                INSERT  5      0      5          2.4    0      2.4   

答案1

对数组数组和 ARGIND 使用 GNU awk:

awk '
    {
        counts[$2][$1][ARGIND] = $3
        times[$2][$1][ARGIND] = $4
    }
    END {
        print "TABLE", "TYPE", \
            "COUNT1", "COUNT2", "DIFF_COUNT", \
            "TIME1", "TIME2", "DIFF_TIME"
    
        for ( type in counts ) {
            delete totCounts
            delete totTimes
            for ( table in counts[type] ) {
                print table, type,                                   \
                    counts[type][table][1]+0,                        \
                    counts[type][table][2]+0,                        \
                    counts[type][table][1] - counts[type][table][2], \
                    times[type][table][1]+0,                         \
                    times[type][table][2]+0,                         \
                    times[type][table][1] - times[type][table][2]
    
                totCounts[1] += counts[type][table][1]
                totCounts[2] += counts[type][table][2]
                totTimes[1]  += times[type][table][1]
                totTimes[2]  += times[type][table][2]
            }
            print "TOTAL", type, \
                totCounts[1], totCounts[2], totCounts[1] - totCounts[2], \
                totTimes[1],  totTimes[2],  totTimes[1]  - totTimes[2]
        }
    }
' file1 file2 | column -t
TABLE                TYPE    COUNT1  COUNT2  DIFF_COUNT  TIME1  TIME2  DIFF_TIME
VALIDATION_DATA_DBF  DELETE  226     200     26          6.4    7.4    -1
TXT_DBF              DELETE  80      70      10          0.15   1.15   -1
TOTAL                DELETE  306     270     36          6.55   8.55   -2
WARNING_DBF          UPDATE  8       2       6           0.055  1.055  -1
TOTAL                UPDATE  8       2       6           0.055  1.055  -1
WARNING_DBF          INSERT  5       0       5           2.4    0      2.4
TOTAL                INSERT  5       0       5           2.4    0      2.4
PERF_DBF             SELECT  0       10      -10         0      0.004  -0.004
TESTING_DBF          SELECT  5       7       -2          0.001  0.009  -0.008
TRAINING_DBF         SELECT  130     135     -5          5.25   7.25   -2
DEFAULT_PROFILE      SELECT  45      40      5           1.2    0.2    1
TOTAL                SELECT  180     192     -12         6.451  7.463  -1.012

答案2

您需要按文件、表格和类型保存计数和时间。最好使用两个数组,加上一个将表附加到类型。在END部分,评估完成;没有太多逻辑,但是要输入一堆文本......尝试

awk '
FNR == 1        {FN++
                }
                {CNT[FN,$1,$2] = $3
                 TIM[FN,$1,$2] = $4
                 if (!index(TYP[$2], $1)) TYP[$2] = TYP[$2] $1 FS
                }
END             {print "TABLE               TYPE    COUNT1 COUNT2 DIFF_COUNT  TIME1 TIME2 DIFF_TIME"
                 for (t in TYP) {TC1 = TC2 = TDF = TT1 = TT2 = TTF = 0
                                 for (m=split(TYP[t], T); m; m--)       {TBL = T[m]
                                                                         IX1 = 1 SUBSEP TBL SUBSEP t
                                                                         IX2 = 2 SUBSEP TBL SUBSEP t
                                                                         printf "%-20s%-8s%-7d%-7d%-10d  %-6g%-6g%-6g\n", TBL, t, CNT[IX1]+0, CNT[IX2]+0, CNT[
                                                                         TC1 += CNT[IX1]
                                                                         TC2 += CNT[IX2]
                                                                         TDF += CNT[IX1] - CNT[IX2]
                                                                         TT1 += TIM[IX1]
                                                                         TT2 += TIM[IX2]
                                                                         TTF += TIM[IX1] - TIM[IX2]
                                                                        }
                                 printf "%-20s%-8s%-7d%-7d%-10d  %-6g%-6g%-6g \n",  "TOTAL", t, TC1, TC2, TDF, TT1, TT2, TTF
                                }
                }
' OFS="\t" file[12]

TABLE               TYPE    COUNT1 COUNT2 DIFF_COUNT  TIME1 TIME2 DIFF_TIME
TXT_DBF             DELETE  80     70     10          0.15  1.15  -1    
VALIDATION_DATA_DBF DELETE  226    200    26          6.4   7.4   -1    
TOTAL               DELETE  306    270    36          6.55  8.55  -2     
WARNING_DBF         INSERT  5      0      5           2.4   0     2.4   
TOTAL               INSERT  5      0      5           2.4   0     2.4    
PERF_DBF            SELECT  0      10     -10         0     0.004 -0.004
TESTING_DBF         SELECT  5      7      -2          0.001 0.009 -0.008
TRAINING_DBF        SELECT  130    135    -5          5.25  7.25  -2    
DEFAULT_PROFILE     SELECT  45     40     5           1.2   0.2   1     
TOTAL               SELECT  180    192    -12         6.451 7.463 -1.012 
WARNING_DBF         UPDATE  8      2      6           0.055 1.055 -1    
TOTAL               UPDATE  8      2      6           0.055 1.055 -1     

相关内容