猛击；求一段时间内每 10 分钟的平均值文件

Question 1

您可以只使用分隔:符,，然后忽略秒，只保留分钟：

$ awk -F[:,] '{
                thisInterval = substr($2,1,1); 
                a[$1":"thisInterval"0"]+=$4; 
              } 
              END{
                    PROCINFO["sorted_in"]="@ind_str_asc"; 
                    for(t in a){print t,a[t]/600
              }
            }'

上面的内容需要 GNU awk 来处理PROCINFO，但你总是可以事后自己重新排序。它还假设每 10 分钟有 600 个数据点。

Answer

您可以只使用分隔:符,，然后忽略秒，只保留分钟：

$ awk -F[:,] '{
                thisInterval = substr($2,1,1); 
                a[$1":"thisInterval"0"]+=$4; 
              } 
              END{
                    PROCINFO["sorted_in"]="@ind_str_asc"; 
                    for(t in a){print t,a[t]/600
              }
            }'

上面的内容需要 GNU awk 来处理PROCINFO，但你总是可以事后自己重新排序。它还假设每 10 分钟有 600 个数据点。

Question 2

GNUawk方法：

简化示例testfile：

09:00:00,1
09:03:00,3
09:09:59,6
10:00:00,1
10:02:49,76.77
10:03:50,38.78
10:05:51,23.23
10:07:52,12
10:09:53,26.47
10:09:59,10.2
10:59:55,32.67
10:59:56,14
10:59:57,42
10:59:58,100
10:59:59,100

awk -F',' 'BEGIN{ d = "9999 01 01 " }
          { 
              gsub(":", " ", $1); 
              if (!ts) ts = mktime(d $1);
              sum += $2; cnt += 1
          }
          cnt == 1 { next }
          (mktime(d $1) - ts) == 599 {
              print sum / cnt;
              ts = sum = cnt = 0
          }' testfile

输出：

3.33333
26.9214

Answer

GNUawk方法：

简化示例testfile：

09:00:00,1
09:03:00,3
09:09:59,6
10:00:00,1
10:02:49,76.77
10:03:50,38.78
10:05:51,23.23
10:07:52,12
10:09:53,26.47
10:09:59,10.2
10:59:55,32.67
10:59:56,14
10:59:57,42
10:59:58,100
10:59:59,100

awk -F',' 'BEGIN{ d = "9999 01 01 " }
          { 
              gsub(":", " ", $1); 
              if (!ts) ts = mktime(d $1);
              sum += $2; cnt += 1
          }
          cnt == 1 { next }
          (mktime(d $1) - ts) == 599 {
              print sum / cnt;
              ts = sum = cnt = 0
          }' testfile

输出：

3.33333
26.9214

Question 3

您可以匹配时间戳中的“0:00”来检测新的十分钟时间段的开始。这是纯 bash 中的示例。它只会处理整数值，但由于计算平均值不是你的困难，你应该能够适应它。

#!/bin/bash

SUM=0
while read line;
do
  # search for "hh:m0:00"
  if [ "${line:4:4}" = "0:00" ]
  then
    # reached new 10 minutes period
    # get average from sum and save it
    echo $((SUM/600)) >> results.txt

    # reset sum
    SUM=0
  fi

  # increment sum with this line value
  SUM=$(($SUM+${line:9}))
done < input.txt

Answer

您可以匹配时间戳中的“0:00”来检测新的十分钟时间段的开始。这是纯 bash 中的示例。它只会处理整数值，但由于计算平均值不是你的困难，你应该能够适应它。

#!/bin/bash

SUM=0
while read line;
do
  # search for "hh:m0:00"
  if [ "${line:4:4}" = "0:00" ]
  then
    # reached new 10 minutes period
    # get average from sum and save it
    echo $((SUM/600)) >> results.txt

    # reset sum
    SUM=0
  fi

  # increment sum with this line value
  SUM=$(($SUM+${line:9}))
done < input.txt

Question 4

10 分钟 = 600 秒，所以我决定只对每 600 行的第二个字段求和，并在到达每第 600 行时打印该值除以 600。

awk -F, '
NR % 600 == 1 {
    start = $1
}
NR % 600 == 0 {
    printf("%s - %s => %f\n", start, $1, avg / 600)
    avg = 0 
}
{
    avg += $2   
}
' input.txt

输出

09:00:00 - 09:09:59 => 49.807600
09:10:00 - 09:19:59 => 50.171900
09:20:00 - 09:29:59 => 47.775433
09:30:00 - 09:39:59 => 48.605350
09:40:00 - 09:49:59 => 49.591117
...
13:20:00 - 13:29:59 => 50.347733
13:30:00 - 13:39:59 => 50.321833
13:40:00 - 13:49:59 => 49.923333
13:50:00 - 13:59:59 => 48.644683
14:00:00 - 14:09:59 => 49.957433
...
16:00:00 - 16:09:59 => 50.333633
16:10:00 - 16:19:59 => 51.799317
16:20:00 - 16:29:59 => 50.931450
16:30:00 - 16:39:59 => 50.734167
16:40:00 - 16:49:59 => 49.857383
16:50:00 - 16:59:59 => 50.433733

为了生成input.txt，我创建了两个程序，使用您喜欢的一个。第二个程序更快。

第一的

date -f <(seq -f '@%g' 21600 50399) '+%H:%M:%S' | 
awk '{
    printf("%s,%.2f\n", $0, rand() * 100)
}'

第二

awk '
BEGIN {
    for(i = 9; i < 17; i++) {
        for(j = 0; j < 60; j++) {
            for(k = 0; k < 60; k++) {
                printf("%02d:%02d:%02d,%.2f\n", i, j, k, rand() * 100)  
            }
        }
    }
}'

Answer

10 分钟 = 600 秒，所以我决定只对每 600 行的第二个字段求和，并在到达每第 600 行时打印该值除以 600。

awk -F, '
NR % 600 == 1 {
    start = $1
}
NR % 600 == 0 {
    printf("%s - %s => %f\n", start, $1, avg / 600)
    avg = 0 
}
{
    avg += $2   
}
' input.txt

输出

09:00:00 - 09:09:59 => 49.807600
09:10:00 - 09:19:59 => 50.171900
09:20:00 - 09:29:59 => 47.775433
09:30:00 - 09:39:59 => 48.605350
09:40:00 - 09:49:59 => 49.591117
...
13:20:00 - 13:29:59 => 50.347733
13:30:00 - 13:39:59 => 50.321833
13:40:00 - 13:49:59 => 49.923333
13:50:00 - 13:59:59 => 48.644683
14:00:00 - 14:09:59 => 49.957433
...
16:00:00 - 16:09:59 => 50.333633
16:10:00 - 16:19:59 => 51.799317
16:20:00 - 16:29:59 => 50.931450
16:30:00 - 16:39:59 => 50.734167
16:40:00 - 16:49:59 => 49.857383
16:50:00 - 16:59:59 => 50.433733

为了生成input.txt，我创建了两个程序，使用您喜欢的一个。第二个程序更快。

第一的

date -f <(seq -f '@%g' 21600 50399) '+%H:%M:%S' | 
awk '{
    printf("%s,%.2f\n", $0, rand() * 100)
}'

第二

awk '
BEGIN {
    for(i = 9; i < 17; i++) {
        for(j = 0; j < 60; j++) {
            for(k = 0; k < 60; k++) {
                printf("%02d:%02d:%02d,%.2f\n", i, j, k, rand() * 100)  
            }
        }
    }
}'

猛击；求一段时间内每 10 分钟的平均值文件

答案1

答案2

答案3

答案4

相关内容