Awk：将多个文件中的列与跨多行数据的计算相结合

Question

#setting ":" as FS allows taking hours as separate field
BEGIN { FS="[:;]" ; OFS="\t" 
        #this gawk feature helps properly addressing the arrays in the end
        PROCINFO["sorted_in"] = "@ind_str_asc"
}

#get device ID from filename on every new file
#get device IDs in array
FNR==1 {devID=FILENAME ; sub(/_.*/,"",devID) ; devs[devID]=devID }

#select time ranges, sum up values in time ranges and count occurences
FNR>1 {
    if ($2 >= 8 && $2 <= 12) {
        vals[devID,$1,1030]=vals[devID,$1,1030]+$NF
        n[devID,$1,1030]++
        }
    else if ($2 >= 13 && $2 <= 17) {
        vals[devID,$1,1530]=vals[devID,$1,1530]+$NF
        n[devID,$1,1530]++
        }
#get dates in array
    dates[$1]=$1
}
    
END {
    #needed for value selection
    times[1030]="10:30"
    times[1530]="15:30"
    #print headers
    printf("date\ttime")
    for (dev in devs) {printf("\t"dev)}
    printf("\n")

    
    #print values
    for (date in dates) {
    #get day of week from system date command
        cmd="date -d"date" +%w"
        cmd | getline dow
    #do not use Sat+Sun
        if ( dow != 0 && dow != 6 ) {
            for (time in times) {
                printf(date"\t"times[time])
                for (dev in devs) {
                    if ( !vals[dev,date,time] ) { printf("\tN/A") }
                    else { printf("\t"vals[dev,date,time]/n[dev,date,time]) }
                }
                printf("\n")
            }
        }
    }   
}

也许不是最优雅的，但它可以完成工作。请注意，数组遍历选项形式gawk是确保设备的列标题与值匹配所必需的。

根据示例输入创建名为 1_04、2_04 和 3_02 的示例输出表单文件，添加了一些日期（5 月 1 日和 2 日是周末，未选择，添加了更多天来测试“N/A”）和一些数字崩溃（以确保数量和设备匹配）。

date    time    1   2   3
2021/05/03  10:30   832 N/A 832
2021/05/03  15:30   406 401 406
2021/05/04  10:30   809 809 1009
2021/05/04  15:30   N/A N/A N/A
2021/05/06  10:30   N/A 832 N/A
2021/05/06  15:30   N/A N/A N/A

正如您所看到的，它甚至会显示全天或时间间隔未给出所有设备的值的情况。但相应的日期必须位于日志文件中。

Answer 1

#setting ":" as FS allows taking hours as separate field
BEGIN { FS="[:;]" ; OFS="\t" 
        #this gawk feature helps properly addressing the arrays in the end
        PROCINFO["sorted_in"] = "@ind_str_asc"
}

#get device ID from filename on every new file
#get device IDs in array
FNR==1 {devID=FILENAME ; sub(/_.*/,"",devID) ; devs[devID]=devID }

#select time ranges, sum up values in time ranges and count occurences
FNR>1 {
    if ($2 >= 8 && $2 <= 12) {
        vals[devID,$1,1030]=vals[devID,$1,1030]+$NF
        n[devID,$1,1030]++
        }
    else if ($2 >= 13 && $2 <= 17) {
        vals[devID,$1,1530]=vals[devID,$1,1530]+$NF
        n[devID,$1,1530]++
        }
#get dates in array
    dates[$1]=$1
}
    
END {
    #needed for value selection
    times[1030]="10:30"
    times[1530]="15:30"
    #print headers
    printf("date\ttime")
    for (dev in devs) {printf("\t"dev)}
    printf("\n")

    
    #print values
    for (date in dates) {
    #get day of week from system date command
        cmd="date -d"date" +%w"
        cmd | getline dow
    #do not use Sat+Sun
        if ( dow != 0 && dow != 6 ) {
            for (time in times) {
                printf(date"\t"times[time])
                for (dev in devs) {
                    if ( !vals[dev,date,time] ) { printf("\tN/A") }
                    else { printf("\t"vals[dev,date,time]/n[dev,date,time]) }
                }
                printf("\n")
            }
        }
    }   
}

也许不是最优雅的，但它可以完成工作。请注意，数组遍历选项形式gawk是确保设备的列标题与值匹配所必需的。

根据示例输入创建名为 1_04、2_04 和 3_02 的示例输出表单文件，添加了一些日期（5 月 1 日和 2 日是周末，未选择，添加了更多天来测试“N/A”）和一些数字崩溃（以确保数量和设备匹配）。

date    time    1   2   3
2021/05/03  10:30   832 N/A 832
2021/05/03  15:30   406 401 406
2021/05/04  10:30   809 809 1009
2021/05/04  15:30   N/A N/A N/A
2021/05/06  10:30   N/A 832 N/A
2021/05/06  15:30   N/A N/A N/A

正如您所看到的，它甚至会显示全天或时间间隔未给出所有设备的值的情况。但相应的日期必须位于日志文件中。

Awk：将多个文件中的列与跨多行数据的计算相结合

答案1

相关内容